Closed fnattino closed 2 years ago
This is looking awesome!
I like the Objectives of this lesson. I think we can potentially split out the Process satellite images in "chunk" to take advantage of parallelization.
into it's own lesson. This would mean we'd have a lesson focused on Data Access, which ends around this cell
import rioxarray
# ... or we can open them directly (and stream content only when necessary)
blue_band_href = assets["B02"].href
blue_band = rioxarray.open_rasterio(blue_band_href)
blue_band
and a separate Parallalizing Raster computation with Dask lesson.
I think the final cell for the Access Data Episode could be saving out the raster with rioxarry. this would involve reassigning the CRS to the mosaicked xarray DataArray we produced with stackstac and then using the .rio.toraster
method. we can borrow from this example my colleague @alexmandel worked on https://github.com/PacificCommunity/DigitalEarthPacific/blob/demo/writeraster/notebooks/demo/cloudless-mosaic-sentinel2.ipynb
I love that you already cover guidelines on how to set the chunk size! An additional topic to cover here could be how to tell if your code is running faster with dask or without dask. For this we could cover using time
, the dask profiler, or some other easily accessible profiling tool in jupyter notebooks. I think we should also have a section describing dask's lazy computation mode and how to take advantage of that to inspect metadata prior to downloading the actual scene data.
For the Raster calculations portion, instead of Raster calculations using stackstac
I suggest we show how to mosaic a collection of scenes. there's stackstac's internal method which just flattens: https://stackstac.readthedocs.io/en/latest/api/main/stackstac.mosaic.html#stackstac.mosaic
I think it would be valuable to show that solution and for a median composite.
Setup instructions will also need to be updated with new dependencies. I've seen the most success with not pinning specific versions to allow a more flexible solve for different machines: https://carpentries-incubator.github.io/geospatial-python/setup.html
A third episode focused on working with a cool looking mosaic could focus on xarray-spatial's raster calc funcs. One idea: computing spectral indices, thresholding them, and polygonizing the result (maybe areas with especially high NDVI): https://github.com/makepath/xarray-spatial
I also like the inclusion of the Dask task graph image. including other images of intermediate results, such as plots of the blue band, could be good to include prior to the final challenge. Also when this gets formatted to the lesson markdown, I think we can create a set of tooltips that refer to other sources for folks to read up on COG, STAC, and Dask, while also briefly summarizing their utility for geospatial.
Hi @rbavery , I have created a first version of a full data access episode. Basically, I have converted the Jupyter notebook that you already had a look at into a .md file and I have added some explanatory text in between the code blocks. Whenever you have time to review it, I would be happy to have any kind of feedback - thanks in advance!
I have also added a first exercise following up on your idea to have participants exploring a STAC catalog even before having the search tool introduced - what do you think about having it formulated in this way?
Still working on the second episode (on parallel raster computation with Dask).
@fnattino thanks I'll give this a review this evening
@fnattino thanks for addressing these reviews! once this data access episode is finished, can we merge that PR and finish the parallelization episode in a separate PR? Feel free to merge this as is now, I or somebody could add a challenge later unless you are already working on it.
Hi @rbavery - thanks a lot for having already a look. I am finishing up the last challenge, I'll ping you as soon as I have pushed it!
Hi @rbavery, this is it - I have added the final challenge.
I have also updated the setup instructions and the environment.yaml
file, adding pystac_client
to the dependencies.
Merging this first and opening a second PR for the parallelisation episode sounds good - I have removed the corresponding notebook from this branch.
One last thing: should this become episode 19? I could set the number and merge if this is alright with you. Really thanks a lot for all the feedback and suggestions!
Fantastic!!! Yes let's make this episode 19 for now. Really looking forward to teaching this! Lgtm feel free to merge.
This is work-in-progress to address #82 .
@rbavery: I have added a notebook with a first sketch of how the episode on data access/parallelization could look like, any feedback is more than welcome!