developmentseed / chabud2023

Change detection for Burned area Delineation (ChaBuD) ECML/PKDD 2023 challenge
Other
5 stars 1 forks source link

:monocle_face: DataPipe for loading ChaBuD 2023 HDF5 files #4

Closed weiji14 closed 1 year ago

weiji14 commented 1 year ago

What I am changing

How I did it

Current datapipeline visualized using torchdata.datapipes.utils.to_graph(dp=dp_train):

hdf5datapipeline

Ideally, the HDF5 files could be streamed directly from HuggingFace into an DataTree object (right now there is a download+cache step). There might be a way to do so using kerchunk.hdf.SingleHdf5ToZarr (which I've tried), but there are some weird errors that comes down to not knowing how the HDF5 files are stored on the HuggingFace Spaces Git LFS storage provider. Some discussion over at https://discourse.pangeo.io/t/accessing-nested-hdf5-file-from-http-via-kerchunk/3432.

How you can test it

Binder

Related Issues

Adapted from some of my previous LightningDataModule code at:

See also torchgeo implementation at https://github.com/microsoft/torchgeo/pull/1259/files