geneontology / archive-reconstruction

Codes to move various legacy files to the current release.geneontology.org
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

gp2protein, gp2rna #3

Closed lpalbou closed 3 years ago

lpalbou commented 4 years ago

We have gp2protein and gp2rna (I think for most revisions) which offer mapping to UniProt IDs. I personally think those would be useful to keep as those accession numbers can still be referred to in UniProt. Example:

FB:FBgn0000008  UniProtKB:Q9W283;UniProtKB:A0A0B4LG21
FB:FBgn0000014  UniProtKB:P29555
FB:FBgn0000015  UniProtKB:P09087
FB:FBgn0000017  UniProtKB:P00522
FB:FBgn0000018  UniProtKB:Q9VKM4;UniProtKB:Q8MR99;UniProtKB:Q95W10
FB:FBgn0000022  UniProtKB:P10083
FB:FBgn0000024  UniProtKB:P07140
FB:FBgn0000028  UniProtKB:P24350
FB:FBgn0000032  UniProtKB:Q8I0P9;UniProtKB:Q9VAD0
FB:FBgn0000036  UniProtKB:P09478
FB:FBgn0000037  UniProtKB:P16395
FB:FBgn0000038  UniProtKB:P04755
FB:FBgn0000039  UniProtKB:P17644

If we keep them, in which destination folder do we want them ? annotations/gp2protein (probably) ? or just gp2protein/ ? Reminder of the current folder hierarchy:

Screen Shot 2020-07-29 at 12 48 13 PM

@pgaudet @cmungall

pgaudet commented 4 years ago

@vanaukenk What do you think, do we need those files ? Given that the specs were not great and not all groups provided them, I would be tempted to drop this. But GOA has them since 2013.

Any strong feeling one way or the other ?

Thanks, Pascale

vanaukenk commented 4 years ago

I'm inclined to say we don't need to archive these files.
I can't immediately think of a use case for them and it seems that, if really needed, they could be reconstructed from UniProt records. Perhaps we should just confirm this on tomorrow's managers call, though.

lpalbou commented 4 years ago

I think I would tend to keep these files, as they could be very useful for a bioinformatician trying to remap the entities of the time and compare enrichments at different time points. Sure we can discuss it tomorrow

pgaudet commented 4 years ago

Discussion with @thomaspd and @lpalbou

Suggest to keep them (since we have them), under a new folder in /metadata -> gp2protein

@kltm Is that OK for you ?

Thanks, Pascale

pgaudet commented 4 years ago

Or - should we call the folder 'id_mappings', since it seems like this is what they are ? and have it in the /annotation folder

pgaudet commented 4 years ago

@kltm and I propose /annotation/gp2protein

OK @lpalbou @thomaspd

Thanks, Pascale

lpalbou commented 4 years ago

This is done for both the CVS & SVN:

pgaudet commented 4 years ago

Are the SVN and CVS examples leading to similar pages ? I couldn't see a difference.

It looks great !

lpalbou commented 4 years ago

They are leading to different pages but that’s the whole purpose that a release created from CVS (2002-2011) or SVN (2011-2018) look the same 😃

pgaudet commented 4 years ago

Will everything be integrated into a single page ? I suppose we dont want to keep the distinction of whether the original data was on SVN or CVS ?

lpalbou commented 4 years ago

I am not following, you have two different pages because they point to two different releases. The SVN example was to show how the migration worked for the 2013-05-01 release (built from SVN) while the CVS example was to show how the migration worked for the 2005-06-01 release (built from CVS). Both links were provided to show that indeed, whatever the release a user will look at throughout that period and independently on how I rebuilt the archive, it will be shown in a consistent way..

pgaudet commented 4 years ago

OK thanks for the precision !