dingo-gw / dingo

Dingo: Deep inference for gravitational-wave observations
MIT License
52 stars 18 forks source link

Create cloud storage for data #40

Open max-dax opened 2 years ago

max-dax commented 2 years ago

We should store some crucial data (e.g., ASD datasets, trained models) in a cloud, and add the corresponding URLs + download functions to dingo. One would access the data via something like dingo.gw.download_data(type='ASDDataset', observing_run='O1'). This would also make it easier for us to share the data, since it is more convenient than manually using a dropbox.

For dingo-enets (https://pypi.org/project/dingo-enets/) I used google drive, since it is free (15GB) and there are libraries for downloading data from it. Stephen said he'd prefer something more professional, so I am open to suggestions for alternatives.

stephengreen commented 2 years ago

Yeah I think 15 GB could be limiting, although the download libraries sound useful. Did you create a new Google account for this? For our internal sharing, this could work as long as we can all upload / download. We could also use Dropbox, or similar service.

For data associated with publications, Zenodo could be a good option. This gives a DOI, but I don't think it allows you to update the files without generating a new record. It allows up to 50 GB per dataset, and there is no limit on the number of datasets.