esciencecenter-digital-skills / geospatial-python

Introduction to Geospatial Raster and Vector Data with Python
https://esciencecenter-digital-skills.github.io/geospatial-python/
Other
3 stars 0 forks source link

Update parallel raster episode #113

Closed fnattino closed 1 month ago

fnattino commented 2 months ago

With this PR I am significantly simplifying the episode on parallel raster calculations, making it just an introduction to the topic with a very easy example.

When thinking how to connect to the ODC-STAC and total view shed episodes, I realised it would be important to still introduce concept such as Dask, chunked array, lazy calculations, and time-profiling. This is now done here, with a super-simple example, i.e. the calculation of NDVI on a full Sentinel-2 tile. This example shows a significant (relative) performance difference with respect to the serial calculation (from 8 sec to 2 sec), making the point that parallelisation can be crucial for larger datasets.

After this, we could show ODC-STAC to merge tiles and to process time series and/or the parallel calculation of the total viewshed. I will make the notebooks for these available, if short of time we could ask participants to vote for the preferred topic.

NOTE: this chapter makes most sense with the red and NIR band available on disk, so it fits well the workshop setup where participants download (a set of) bands at the beginning.

github-actions[bot] commented 2 months ago

Thank you!

Thank you for your pull request :smiley:

:robot: This automated message can help you check the rendered files in your submission for clarity. If you have any questions, please feel free to open an issue in {sandpaper}.

If you have files that automatically render output (e.g. R Markdown), then you should check for the following:

Rendered Changes

:mag: Inspect the changes: https://github.com/esciencecenter-digital-skills/geospatial-python/compare/md-outputs..md-outputs-PR-113

The following changes were observed in the rendered markdown documents:

 01-intro-raster-data.md                       |  62 ++--
 02-intro-vector-data.md                       |  30 +-
 03-crs.md                                     |  20 +-
 04-geo-landscape.md                           |  24 +-
 07-vector-data-in-python.md                   | 483 ++++++++++++--------------
 11-parallel-raster-computations.md            | 199 ++++-------
 fig/E07/greece_administration_areas.png (new) | Bin 0 -> 64055 bytes
 fig/E07/greece_highways.png (new)             | Bin 0 -> 26479 bytes
 fig/E07/rhodes_administration_areas.png (new) | Bin 0 -> 17351 bytes
 fig/E07/rhodes_assets.png (new)               | Bin 0 -> 45400 bytes
 fig/E07/rhodes_builtup_buffer.png (new)       | Bin 0 -> 18396 bytes
 fig/E07/rhodes_highways.png (new)             | Bin 0 -> 80495 bytes
 fig/E07/rhodes_infra_highways.png (new)       | Bin 0 -> 46913 bytes
 fig/E11/dask-graph.png                        | Bin 2311171 -> 94585 bytes
 index.md                                      |  30 +-
 md5sum.txt                                    |  16 +-
 setup.md                                      |  24 +-
 17 files changed, 389 insertions(+), 499 deletions(-)
What does this mean? If you have source files that require output and figures to be generated (e.g. R Markdown), then it is important to make sure the generated figures and output are reproducible. This output provides a way for you to inspect the output in a diff-friendly manner so that it's easy to see the changes that occur due to new software versions or randomisation.

:stopwatch: Updated at 2024-05-01 14:52:58 +0000