calipho-sib / cellosaurus

A knowledge resource on cell lines - From SIB CALIPHO group
https://www.cellosaurus.org
Creative Commons Attribution 4.0 International
13 stars 0 forks source link

include versions in bulk data downloads? #5

Closed khughitt closed 4 years ago

khughitt commented 4 years ago

Currently, the FTP access for cellosaurus data provides the most recent versions of the data.

For reproducibility and provenance purposes, it might be helpful to include sub-folders with each new version. This could be as simply as "feb20" if you don't have any plans to explicitly version the data and just want to release monthly "snapshots".

AmosBairoch commented 4 years ago

Your comment prompter me to create a FAQ for the next release of the Cellosaurus. Here is the draft of this FAQ

Q25: How can I access an old version of the Cellosaurus?

A:Lets start by some preliminary explanations and a bit of "history":

File name                       Release and date of 1st distribution
------------------------------  ------------------------------------
cellosaurus.txt                 2.0 of 04-Apr-2012
cellosaurus_relnotes.txt        2.0 of 04-Apr-2012
cellosaurus.obo                 4.0 of 22-Oct-2012
cellosaurus_deleted_ACs.txt     7.0 of 05-Nov-2013
cellosaurus_refs.txt            9.0 of 16-Apr-2014
cellosaurus_xrefs.txt           9.1 of 17-Jul-2014
cellosaurus_faq.txt             15.0 of 14-Dec-2015
cellosaurus.xml                 20.0 of 01-Dec-2016
cellosaurus.xsd                 20.0 of 01-Dec-2016
cellopub.txt                    21.0 of 03-Mar-2017
cellosaurus_name_conflicts.txt  23.0 of 22-Aug-2017

So where can you find old versions of the Cellosaurus files?

a) Starting with release 11.0 of 07-Nov-2014 the Cellosaurus files are on a GitHub directory at: https://github.com/calipho-sib/cellosaurus

So to get the files for a particular release go to: https://github.com/calipho-sib/cellosaurus/commits/master

Look for the commit labelled with the release number you are interested i (example "Release 15"). Click on that commit then click on the "Browse files" button and when the list of files is displayed click on the green button "Clone or download" and select the "Download ZIP" option.

All the Cellosaurus files are on GitHub with one exception: the XML file (cellosaurus.xml) which is too big to be stored on this platform.

b) We have archived all releases of the Cellosaurus on Yareta, the research data repository of Geneva's higher education institutions. To access the Cellosaurus archives go to: https://yareta.unige.ch/frontend/search

and search for "Cellosaurus".

Note that the Yareta archives for releases 2 to 32 do not include the OBO and XML files.

khughitt commented 4 years ago

Great! Thanks for taking the time to clarify and put together a FAQ!