carpentries-incubator / geospatial-python

Introduction to Geospatial Raster and Vector Data with Python
https://carpentries-incubator.github.io/geospatial-python/
Other
155 stars 57 forks source link

reorder and update data access lesson to be the starting code episode at #5 #93

Closed rbavery closed 1 year ago

rbavery commented 2 years ago

this is an open discussion following up on the discussion in this PR: https://github.com/carpentries-incubator/geospatial-python/pull/91 My loose idea is to

  1. move the data access episode to be the first code epsiode
  2. add code blocks at the end to download any additional raster datasets that should be saved out locally to carry out the crop, reproject, and raster calculation episodes.
  3. not store any raster data on figshare, only vector data to keep the download lightweight and make it easier to switch/update datasets and control this in the lesson code

We could still leave the parallelization and focal statistics/time series episodes that are WIP toward the end of the lesson since these are more advanced. maybe for these episodes we could work with different STAC datasets to change things up. The Planetary Computer offers Landsat Collection 2 Surface Reflectance

@fnattino @rogerkuou

rbavery commented 2 years ago

Hi @rogerkuou I'm continuing the discussion in this PR here https://github.com/carpentries-incubator/geospatial-python/pull/91#issuecomment-1032931963

Indeed I think starting with STAC in this course can be very fascinating. I do agree to make it as one flavor of the workshop, but maybe not by default a mandatory part. The major concern of me and Francesco is that for some audiences the STAC part may be too difficult.

I think these are all good points. Maybe we can answer the question of whether STAC is too advanced to be an opening lesson when we teach it to NASA DEVELOP (and also your group of learners at the eSciences Center?). If it goes fine and we see in the post workshop feedback that folks found it to not be overly challenging, then maybe that can inform it's place in the lesson order? My hunch is that accessing STAC catalogs with pystac is the easiest way to get the data and it won't be overly challenging for an opener, but I think we should test the lesson first as a final lesson and then discuss.

Besides, the STAC data repository may also change, which will give us surprises. We still think at least for now we make STAC as an optional episode. For some certain workshops, if the instructors feel comfortable, they can choose to start with the STAC episode and make other episodes dependent on it. And @fnattino if I missed something please feel free to add here.

Good point. My sense is that the AWS and Planetary Computer STAC repos are pretty stable, but the same can't be said for other STAC repos and it's not out of the question that they could change or stop being updated in the long term future. I'd want to communicate this in the episode.

We can take some time to decide the data on other episodes. For now, I think I will just try to add the new data to the Figshare.

Agreed, let's definitely keep the lesson where it is and then discuss later once we get learner feedback, from NASA DEVELOP and from the groups you teach too. Just to reiterate, I think the big advantage to moving the lesson up in the order are mainly 1) starts the lesson with the first step that matches a real geospatial project (data access) and to a lesser extent 2) reduces the size of the data download from figshare which has been an issue for some internet connections in the past (delays no more than 10 minutes).

Actually @rbavery I need your help on that. I put the raster and vector data into a .zip, and put it in my working drive. Could you please point me to a guide on how to update the Figshare data?

I think the figshare can only be updated from my personal gmail account currently, I'll try to give you and @fnattino access

rbavery commented 1 year ago

closing, lesson v2 is now out