geneontology / archive-reconstruction

Codes to move various legacy files to the current release.geneontology.org
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link
geneontology

Archive reconstruction

The GO project has been relying on SVN, CVS and archive.geneontology.org for a long time.

The refactoring of GO requires a reorganization of some of the underlying infrastructures and remapping of old files into more up-to-date folder hierarchy (see current.genontology.org).

Full archive generated from SVN and CVS: https://geneontology-test.s3.amazonaws.com/index.html

GO archive content

Notes:

SVN reconstruction steps / usage

  1. Have a GO SVN up and running
  2. List all revision from SVN: svn log <svn-url/svn/go/trunk> > gosvn.log to create the list of revisions
  3. Clean up the log: more gosvn.log | grep '^r[0-9]' > gosvn.list
  4. Create 1 revision / month: python3 select_revisions.py -r gosvn.list -o revisions_target.list
  5. Remap the GO SVN data to a newer folder hierarchy: python3 create_archive.py -s <svn-base-url> -r revisions_target.list -m mapping.txt -c checkouts/ -o releases/

This will checkout the selected revisions in revisions_target.list and remap them from the temporary checkout folder checkouts/ to the new folder releases/ using the mapping.txt mapping rules

Note: there will be some "Error while copying file" as we have to handle different file hierarchies over time (eg gene_association.goa_chicken.gz that became goa_chicken.gaf.gz and the script will be looking for both). Therefore, one should not be too concerned about those messages but they are still useful for debugging / logging of events.

CVS reconstruction steps / usage

  1. Have a GO CVS up and running
  2. Create 1 revision / month
  3. Remap the GO CVS data to a newer folder hierarchy: `python create_archive_from_cvs.py -m -s -c -o

Example of files currently generated

Release generated from CVS: https://geneontology-test.s3.amazonaws.com/2005-12-01/index.html Release generated from SVN: https://geneontology-test.s3.amazonaws.com/2012-12-01/index.html

Note: browsing of the S3 bucket is inspired from aws-js-s3-explorer but was remodeled to fit a canonical URL model and add the desired header / description. Actual browser code

Requirements