davidbrochart / pangeo-streamflow

A global streamflow model using the Pangeo platform
2 stars 0 forks source link

Watershed delineation tool using lazy loading with zarr and Dask? #1

Open sebastienlanglois opened 4 years ago

sebastienlanglois commented 4 years ago

Reopening this issue in the appropriate repository.

Hi @davidbrochart,

I have noticed that you've added a zarr version of the Hydroshed's direction and flow_accumulation at 3 seconds resolution on the pangeo datastore.

Have you been able to use a watershed delineation tool (such as pysheds) on top of the zarr data?

Most of the watershed delineation tool are limited to process rather small sized DEM/Dir/Flow_acc rasters in memory only which means that these inputs need to be recreated/transformed for every new project.

I feel like there is an opportunity for such tools to use zarr format as input with lazy loading on top of Dask. Have you ever worked on or seen such tool?

Thanks,

Hi Sebastien,

I didn't know about pysheds, thanks for pointing out! I have this project, that I need to get back to: https://github.com/davidbrochart/pangeo-streamflow It has watershed delineation using Cython, with data in the pangeo store but in GDAL VRT format. I don't remember having uploaded Hydroshed's flow direction and accumulation in the zarr format, but anyway it wouldn't be hard to do. VRT gives us similar chunking in this case. I don't think this code works right know but it is definitely on my roadmap, I just need to find some time to work on it. On top of delineation, I have implemented what I call "basin partitioning" (see http://davidbrochart.github.io/streamflow/visualization/2017/01/13/streamflow-visualization.html).

If this is of interest, I could polish the code and make a library out of it.

_Originally posted by @davidbrochart in https://github.com/davidbrochart/flow_acc_3s/issues/1#issuecomment-676705643_

sebastienlanglois commented 4 years ago

This repository looks really promising and your blog is super interesting! Especially with SWOT's satellite launch around the corner, virtual stations will likely become more available and accurate for hydrological modelling.

Sorry about the confusion with vrt vs zarr, I read too fast. However, as you mentioned, the point on using chunks still stands whatever format is being used in this use case.

I am definitely interested in testing a working version of this repo and if you want to make a library out of it even better!

Also, you might be interested in this : MERIT Hydro. It is an updated version (2019) similar to Hydrosheds and comes with flow accumulation at 3 arc-second off the bat. Folks from the SWOT mission are thinking of using it instead of Hydrosheds as part of their global lake and river database. I've been meaning to convert MERIT Hydro to a zarr format and upload it to the pangeo datastore or elsewhere online. Could that be of any help to you also? Also, I would be glad to test or contribute to this repo should you need some help.

davidbrochart commented 4 years ago

Thanks! Yes I think that uploading MERIT Hydro to Pangeo in zarr format would be useful. I'm going to try and find time to work on this repo again.

davidbrochart commented 4 years ago

There is also this new library which seems very interesting, including for hydrology: https://fastscapelib.readthedocs.io

sebastienlanglois commented 4 years ago

Interesting, I will look into it!