ClarkCGA / multi-temporal-crop-classification-training-data

This repository contains the pipeline for generating a training dataset for land cover and crop type segmentation using USDA CDL data.
Apache License 2.0
3 stars 4 forks source link

Query and export HLS data #4

Closed HamedAlemo closed 7 months ago

HamedAlemo commented 1 year ago

Using the definition of tiles from here, we need to retrieve corresponding scenes from HLS dataset. Specification:

HamedAlemo commented 1 year ago

Here is the source code from NASA for querying and exporting HLS data.

mcecil commented 1 year ago

geojson.io for creating geojson

mcecil commented 1 year ago

The burn scar scripts (https://github.com/NASA-IMPACT/hlsfm_burn_scar_pipeline) don't do things terribly efficiently. Here is how my scripts match up.

image

The bs script "0. subset_burn_shapefile.py" is specific to burn scars, so I don't copy it.

The bs script "1. save_HLS_query_dataframe.py" stores all potential download urls for the aoi. This can be basically copied to my notebook "1_CDL_save_HLS_query.ipynb"

The bs script "2. create_HLS_masks_bulk.py" is doing MANY things (inefficiently, by downloading full HLS tiles for each geojson), so I split it up. My script "2a_CDL_create_HLS_masks_bulk.ipynb" takes care of file processing. (loading geojson for aoi/chip, identifying closest tile, downloading all images for that tile as hdf, extracting hdf metadata, and converting to tif).

There are at least 2 issues with script (2a). First, the function to return cloud cover and spatial coverage metadata does not work. It returns an empty dictionary.

nasa_hls.get_metadata_from_hdf(hdf_dir+local_name, fields=['cloud_cover', 'spatial_coverage'])

Second, the function to convert from hdf to tif does not work. I also have a workaround for this, but not ideal.

nasa_hls.convert_hdf2tiffs(Path(hdf_dir+local_name), Path(tiff_dir))

I will also create a script (2b) that takes care of the cropping/masking of .tif files. (not done yet).

HamedAlemo commented 1 year ago

Thanks @mcecil . Couple of things:

mcecil commented 1 year ago

I would just use the base geojson which I can share (in lat long). I tried reprojecting it (in R) but not sure if it worked correctly. I'll push the geojson to the repo.

Yes, I can update the root_path.

I did run a sample HDF through nasa_hls.convert_hdf2tiffs and it does some weird things. It attempts to create a folder for each image, but does not populate it. Here is the error.

image

mcecil commented 1 year ago

I've resolved the issues with the NASA_HLS functions.

The cropping/masking is more complicated. The workflow seems rather complicated, creating Boolean masks using the entire tile raster as a reference raster (so a large file). And there seems to be an error (still) with the georeferencing after cropping. I'm not sure if this would affect the DL model, as the error may exist for both the mask and band layers.

In any case, I have got this to work using rasterio mask and cropping, that seems to work but has some edge effects (some pixels on the border are not in mask).

mcecil commented 1 year ago

For HLS image tracking

HamedAlemo commented 1 year ago

Final steps to follow:

Let's cover the chipping of HLS and CDL in #5 .

mcecil commented 1 year ago

I've created 'workflow' notebook in the rewrite branch that will go through all steps.

Weird issue occurring. I've selected three images for conversion to COG. One of them does not convert. It creates an empty folder in the tif directory, but not files. the other two hdf convert fine.

The '007' image does not convert while the '032' and '052' do convert from hdf to cog. I tested on my old code and I was able to get the '007' image to convert.

image

HamedAlemo commented 1 year ago

@mcecil please share the url of the HDF file for the 007 image so I can try on my end and see if I can debug.

mcecil commented 1 year ago

Here is the bad file: https://hls.gsfc.nasa.gov/data/v1.4/S30/2020/15/S/T/T/HLS.S30.T15STT.2020007.v1.4.hdf

Subbing in days 32 and 52 should give files that work.

mcecil commented 1 year ago

not sure if it matters, but the reprojected HLS tifs have a weird 0 data rectangle above the hls values.

the images do align though. I checked pixel alignment and also road overlap (so the HLS image is in the right place)

image

mcecil commented 1 year ago

to-do list:

For the cloud coverage issue, I tested one tile T15STT. There were 0 images in Mar-Sept with 0% cloud cover. There were 7 images with <= 5%

HamedAlemo commented 1 year ago

not sure if it matters, but the reprojected HLS tifs have a weird 0 data rectangle above the hls values.

the images do align though. I checked pixel alignment and also road overlap (so the HLS image is in the right place)

image

@mcecil we forgot to talk about this in our call. This is the result of interpolation. I wouldn't worry about it.

kordi1372 commented 1 year ago

https://nasa-openscapes.github.io/2021-Cloud-Hackathon/tutorials/02_Data_Discovery_CMR-STAC_API.html

HamedAlemo commented 1 year ago

@mcecil and @kordi1372 , I just noticed we didn't close the issues on this report from the first version of the code that Mike developed. It's best if we close these, since they are implemented already with v1.4 of the data, add a tag to GitHub to keep record of the current working version of the code (Let me know if you need help with this) and start a new set of issues for Fatemeh to update the code for using v2.0 of the data.