angelolab / Nimbus

Other
12 stars 1 forks source link

Figure out plan for hosting large image datasets #61

Closed JLrumberger closed 11 months ago

JLrumberger commented 1 year ago

We have a number of projects in the works in the lab that will have very large accompanying datasets (300-3,000GB). We'll want to figure out what the best way to make this data available for download is, for people who decide they want access to the raw data.

The Bodenmiller lab does Zenodo for hosting their data. The Schuerch dataset is available via TCIA. It would be great to do a survey of what options are available, what are the pros and cons, etc so we can figure out a standardized lab plan.

srivarra commented 1 year ago

@JLrumberger @ngreenwald

Options

Open Science Framework (OSF)

Main Features

Pros:

Cons:

Takeaway: It's free, but we pay the price with less than seller functionality.

XetHub

Main Features

Pros:

Cons:

Takeaway: Powerful, isn't really a batteries included product. Also pretty expensive.

Open Microscopy - Image Data Resource

Main Features

Pros:

Cons:

Takeaway: Seems to be good, but it's more complicated than "zip up the data and upload to a server".

Zenodo

Main Features

Pros:

Cons:

Takeaway: The dataset size limitation is extremely unfortunate, as everything else sounds pretty good.

We host it ourselves and maintain it

Main Features:

Pros:

Cons:

Takeaway: Lots options to try here, we can host the data and see what works, also means a lot of time and effort will be needed to get something up and running.

ngreenwald commented 1 year ago

Lets investigate the usability of Zenodo and Image Data Resource to start. How much work is it to format data for upload, and how easy is it to download?

ngreenwald commented 1 year ago

And also https://www.ebi.ac.uk/bioimage-archive/