OARS-SAFS / resources

Open and Reproducible Science Resources
17 stars 0 forks source link

Public data repo that allows dataset updates #15

Open chelsealwood opened 4 years ago

chelsealwood commented 4 years ago

Hi gang,

I'm writing with a question about data repositories. As you might know, I recently took over the Fisheries Ecology course (FISH 312) from Tom Quinn, and over the decades in which Tom taught the course, he amassed three long-term datasets on fish abundances in Rock Creek, Lake Washington, and Puget Sound - data that students collect during field trips. I'm planning to continue teaching these field trips and collecting data in exactly the same way. The datasets are a treasure trove of information, and it seems silly to me that they now live on my hard drive and are not publicly available. I've looked into publishing these datasets to a public data repository, but I want to be able to add new lines to them as we collect new data each year, and the usual data repositories (e.g., Dryad) don't allow this. Do you know of any data repositories that (1) accept ecological data, (2) issue DOIs, and (3) allow addition of new lines to existing datasets?

Thanks in advance for sharing your open-science expertise!

-Chelsea

github-actions[bot] commented 4 years ago

Thanks so much for posting your first issue in this repo!

sr320 commented 4 years ago

Great Question! A couple of options come to mind.

Figshare and Zenodo both support DOI versioning which would allow you to update. Plus both can link with GitHub (if in fact you are maintaining datasets there).

Examples: https://doi.org/10.6084/m9.figshare.7562354.v2 (noting this is version 2) https://zenodo.org/record/19046#.XlQ-CBNKhTY

Open Science Framework is another option. It integrates with several services and provides a DOI. I do not believe OSF supports versioning however you can simply update data and DOI maintains. This could be a simple solution.

Example OSF repo: https://osf.io/j8rc2/

I am sure there are other options... curious if others have ideas.

chelsealwood commented 4 years ago

Thanks, Steven! I'll look into these. In the meantime, if anyone else has other ideas, I'm all ears :)

sr320 commented 4 years ago

If you are interested in linking GH with these repos here are a couple of useful links

https://guides.github.com/activities/citable-code/ https://knowledge.figshare.com/articles/item/how-to-connect-figshare-with-your-github-account-1

And note a similar convo is also going on over on the twitter, minus the versioning feature.

https://twitter.com/TrevorABranch/status/1231302472557219840

mdscheuerell commented 4 years ago

My understanding is that Zenodo does not allow you to edit a data set once it has been assigned a DOI, but it's probably not a bad idea to just a get a new DOI each time (year?) you update the data. That said, I've used them before and it's worked well.

sr320 commented 4 years ago

@mdscheuerell Recently they started to support versioning https://blog.zenodo.org/2017/05/30/doi-versioning-launched/

Which maintains the top level DOI and still allows updating of the data.

mdscheuerell commented 4 years ago

Thanks for the update @sr320. That's good news!

chelsealwood commented 4 years ago

Cool! This looks like a fantastic option.

juniperlsimonis commented 4 years ago

definitely check out this paper on developing a modern workflow for evolving data: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000125 the portal project is set up like what you're talking about: https://github.com/weecology/PortalData

mdscheuerell commented 4 years ago

Thanks, Juniper! Those look like great resources.

juniperlsimonis commented 4 years ago

for sure! and lmk if anyone is looking for helping doing that kind of stuff! definitely a service @dapperstats provides. :D