Implement adapters for custom datasets

Cloud-Drift / clouddrift

CloudDrift accelerates the use of Lagrangian data for atmospheric, oceanic, and climate sciences.

https://clouddrift.org/

MIT License

38 stars 8 forks source link

Implement adapters for custom datasets #53

Closed milancurcic closed 7 months ago

milancurcic commented 1 year ago

Some are already implemented in clouddrift-examples.

[x] GDP
[x] GLAD
[ ] LASER
[ ] SPLASH
[ ] OceanParcels output (Zarr)
[x] MOSAiC sea-ice drift
[ ] NASA S-MODE (1,2,?)
[ ] Electric field and conductivity measurements in the stratosphere

selipot commented 1 year ago

add GDP 6-hourly dataset to the list

selipot commented 1 year ago

@milancurcic let's continue to do this using the list above and add

[ ] https://arcticdata.io/catalog/view/doi:10.18739/A2KP7TS83

philippemiron commented 1 year ago

I don't think GLAD, and Laser experiments by themselves are really interesting/useful.. I would instead include this dataset from J. Lilly, which regroups all the experiments in the GoM.

https://zenodo.org/record/4421585

milancurcic commented 1 year ago

GulfDrifters is already a ragged-array dataset so I think it's a good candidate for a dataset accessor function (i.e. clouddrift.datasets.gulfdrifters()). I don't think it needs an adapter.

However, GulfDrifters can't replace GLAD, LASER, SPLASH, etc. These specific datasets are available at 15-minute QC'd or 5-minute raw, and for some processes hourly is too coarse.

philippemiron commented 1 year ago

I understand, but there are like 50 of those experiments. 😆 If you want to do it, go for it, just thinking that having one example code could be enough.

milancurcic commented 1 year ago

I don't see adapters as examples of how to do it. Instead, they're functions to create cloud-ready ragged-array versions of these datasets that could then be made accessible via clouddrift.datasets. Providing easy access to ragged-array datasets via a single function call would be one of the ways to adopt users. To illustrate: a future hypothetical user may search for "how to load GLAD data in Python", and the top result would point to a CloudDrift function.

philippemiron commented 1 year ago

Sorry, I was thinking in clouddrift-examples.

If the data is on AWS/GCP, or already ragged, I guessed it's simple to add it to a datasets. On the other hand, if it's a collection of dozens of randomly named csv files, I wouldn't want to hardcore anything that can eventually break the library. That's why I started the examples, so people can just use the RaggedArray class if they want to.

selipot commented 1 year ago

Great job on the MOSAiC dataset. Let's continue down the list.

kevinsantana11 commented 7 months ago

The missing datasets have each been created as a separate issue