Closed lpalbou closed 3 years ago
@lpalbou Lovely.
As mentioned in a thread, there is a bit of "mapping" that we (unfortunately) do for filenames in the pipeline:
This was added as a late sop to legacy SVN users (now almost all gone) as we migrated away from SVN. This should be disappearing in fairly short order after the legacy work is done. It likely makes sense to have the canonical/historical version of this exist in your code/metdata, rather than awkwardly embedded in the pipeline file.
I updated to a V2 that should take care of the file mapping: https://github.com/geneontology/archive-reconstruction/commit/fb5df41468c5469238c885b69ac4b6f6f56df879 .
The example releases generated for this V2 are here: https://geneontology-tmp.s3.amazonaws.com/index.html#releases-2/ (should finish upload in a few hours). They use current annotation filenames.
Notes:
Are you saying you will make the names of the files match what we currently have ? For example
Or are we trying t keep the legacy names ?
On that topic, are the names of the various folders fixed ? for example, ideally, I dont think we should have two 'annotation' folders, regardless of the fact that they are in different parent folders.
Hi @lpalbou Another question, I see that your nice interface has the structure '/releases/2016-08-01/[annotations/ontology/products]' are we keeping this, regardless of the fact that there were no 'releases' before the current pipeline ?
@lpalbou Looks like in your browser, only the source files are present in the/products folder.
@kltm I suspect you're not going to like those suggestions, but it feels like this would be a good opportunity to clarify all our files.
@cmungall Thanks, Pascale
@pgaudet we are not keeping the legacy names and trying to remap to current names.
The link you provided for the S3 was the first initial attempt (I am gonna update the ticket with the new URL, also on the main README). Please check instead: https://geneontology-tmp.s3.amazonaws.com/index.html#releases-full/2016-08-01/annotations/
On that topic, are the names of the various folders fixed ?
Nothing is fixed, I am just remapping to what we currently have for consistency but as you know I am not thrilled either of the current folder hierarchy.
Another question, I see that your nice interface has the structure '/releases/2016-08-01/[annotations/ontology/products]' are we keeping this, regardless of the fact that there were no 'releases' before the current pipeline ?
I would say yes, so that users could refer to that specific version of GO ? But if you prefer, we could also have '/archive/2016-08-01/[annotations/ontology/products]' . Probably more correct but it would complicate slightly the reuse by bioinformatician.
Can we change the folder names ?? and ideally the contents
For the archive, it's easy, we just have to edit the mapping file: https://github.com/geneontology/archive-reconstruction/blob/master/mapping.txt
To clarify, are you proposing that remapping only for the archive or for both the archive and our current releases ? I am guessing the later. I don't like either the current folder hierarchy we have so I think it would be great to make it more intuitive; at the same time, we have about 2 years of Zenodo archive with that format so we would have to discuss if and how we want to deal with that.
The remapping of the SVN repo is handled by a simple mapping file. It allows:
ontology/subsets/ ontology/subsets/
)gene-associations/*.gz to annotations/*.gaf.gz
)gene-associations/gene_association.aspgd.gz annotations/aspgd.gaf.gz
)@kltm @pgaudet I have created a default mapping.txt which created this archive: https://geneontology-tmp.s3.amazonaws.com/index.html#releases/
By editing further the mapping file, we can remap to specific filenames that would be more consistent with the current.geneontology.org (mostly the GAFs)