ACDguide / BigData

Working with big/challenging data collections
https://ACDguide.github.io/BigData
Other
5 stars 5 forks source link

Issue on page /data_storage.html should we cover kerchunk? #49

Open paolap opened 2 years ago

paolap commented 2 years ago

https://fsspec.github.io/kerchunk/

kerchunk is an interesting option for cloud optimised storage of netcdf, hdf and grib data. It seems to work more as a virtual aggregation that creates a single .zarr or .json reference file that points to all the individual files as a single dataset. I think it might be also indexing the actual chunks. Compared to Zarr on its own there's no data duplication. Someone just mentioned to me this morning and I had a quick look to the documentation.

paigem commented 2 years ago

Great, this definitely looks like something we can add! I have heard some chatter about it in Pangeo circles, but hadn't actually taken the time to look at what it is just yet.

Is this something you want to add @paolap?

paolap commented 2 years ago

I guess if no one has tried it already, I can try to have a go at using it before writing more about it

paigem commented 2 years ago

That would be great @paolap - sounds like you're already a step ahead of me in knowing what it is! But happy to help if needed.