Closed HamedAlemo closed 7 months ago
Here is the source code from NASA for querying and exporting HLS data.
geojson.io for creating geojson
The burn scar scripts (https://github.com/NASA-IMPACT/hlsfm_burn_scar_pipeline) don't do things terribly efficiently. Here is how my scripts match up.
The bs script "0. subset_burn_shapefile.py" is specific to burn scars, so I don't copy it.
The bs script "1. save_HLS_query_dataframe.py" stores all potential download urls for the aoi. This can be basically copied to my notebook "1_CDL_save_HLS_query.ipynb"
The bs script "2. create_HLS_masks_bulk.py" is doing MANY things (inefficiently, by downloading full HLS tiles for each geojson), so I split it up. My script "2a_CDL_create_HLS_masks_bulk.ipynb" takes care of file processing. (loading geojson for aoi/chip, identifying closest tile, downloading all images for that tile as hdf, extracting hdf metadata, and converting to tif).
There are at least 2 issues with script (2a). First, the function to return cloud cover and spatial coverage metadata does not work. It returns an empty dictionary.
nasa_hls.get_metadata_from_hdf(hdf_dir+local_name, fields=['cloud_cover', 'spatial_coverage'])
Second, the function to convert from hdf to tif does not work. I also have a workaround for this, but not ideal.
nasa_hls.convert_hdf2tiffs(Path(hdf_dir+local_name), Path(tiff_dir))
I will also create a script (2b) that takes care of the cropping/masking of .tif files. (not done yet).
Thanks @mcecil . Couple of things:
geojson_file
and geojson_rpj_file
in 2a_CDL_preprocess_HLS_to_TIF.ipynb
? I want to run the code using a sample aoi but not sure which of these I should replace.2a_CDL_preprocess_HLS_to_TIF.ipynb
, define a root_path
variable that we can set at the top, and just use that for all the file paths in the following. nasa_hls.convert_hdf2tiffs
to see if you get any error or missing crs in the output GeoTIFF? I would just use the base geojson which I can share (in lat long). I tried reprojecting it (in R) but not sure if it worked correctly. I'll push the geojson to the repo.
Yes, I can update the root_path.
I did run a sample HDF through nasa_hls.convert_hdf2tiffs
and it does some weird things. It attempts to create a folder for each image, but does not populate it. Here is the error.
I've resolved the issues with the NASA_HLS functions.
get_metadata_from_hdf
and convert_hdf2tiffs
, I had to replace single quotes ' with double quotes " for the command sent to shell. I also had to do slight editing of output for the metadata. For both functions, I had to create my own version of the function in the notebook.The cropping/masking is more complicated. The workflow seems rather complicated, creating Boolean masks using the entire tile raster as a reference raster (so a large file). And there seems to be an error (still) with the georeferencing after cropping. I'm not sure if this would affect the DL model, as the error may exist for both the mask and band layers.
In any case, I have got this to work using rasterio mask and cropping, that seems to work but has some edge effects (some pixels on the border are not in mask).
For HLS image tracking
Final steps to follow:
Let's cover the chipping of HLS and CDL in #5 .
I've created 'workflow' notebook in the rewrite branch that will go through all steps.
Weird issue occurring. I've selected three images for conversion to COG. One of them does not convert. It creates an empty folder in the tif directory, but not files. the other two hdf convert fine.
The '007' image does not convert while the '032' and '052' do convert from hdf to cog. I tested on my old code and I was able to get the '007' image to convert.
@mcecil please share the url of the HDF file for the 007
image so I can try on my end and see if I can debug.
Here is the bad file: https://hls.gsfc.nasa.gov/data/v1.4/S30/2020/15/S/T/T/HLS.S30.T15STT.2020007.v1.4.hdf
Subbing in days 32 and 52 should give files that work.
not sure if it matters, but the reprojected HLS tifs have a weird 0 data rectangle above the hls values.
the images do align though. I checked pixel alignment and also road overlap (so the HLS image is in the right place)
to-do list:
For the cloud coverage issue, I tested one tile T15STT. There were 0 images in Mar-Sept with 0% cloud cover. There were 7 images with <= 5%
not sure if it matters, but the reprojected HLS tifs have a weird 0 data rectangle above the hls values.
the images do align though. I checked pixel alignment and also road overlap (so the HLS image is in the right place)
@mcecil we forgot to talk about this in our call. This is the result of interpolation. I wouldn't worry about it.
@mcecil and @kordi1372 , I just noticed we didn't close the issues on this report from the first version of the code that Mike developed. It's best if we close these, since they are implemented already with v1.4 of the data, add a tag to GitHub to keep record of the current working version of the code (Let me know if you need help with this) and start a new set of issues for Fatemeh to update the code for using v2.0 of the data.
Using the definition of tiles from here, we need to retrieve corresponding scenes from HLS dataset. Specification: