glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

New input file for alliance_genome #1468

Closed katewarner closed 1 day ago

katewarner commented 1 week ago

We have a new input for all *_protein_disease_alliance_genome.csv: downloads/alliance_genome/current/DISEASE-ALLIANCE_COMBINED.tsv

It means that instead of individual species files there is a single file containing all the species (DISEASE-ALLIANCE_COMBINED.tsv). Use the Taxon column to determine species.

Taxon   SpeciesName     DBobjectType    DBObjectID      DBObjectSymbol  AssociationType DOID    DOtermName      WithOrtholog    InferredFromID  InferredFromSymbol      ExperimentalCondition   ModifierEvidenceCode     EvidenceCodeName        Reference       Date    Source
NCBITaxon:10090 Mus musculus    gene    MGI:95564       Fmr1    is_implicated_in        DOID:14261      fragile X syndrome              MGI:5292357     Fmr1<sup>tm1.2Cidz</sup>/Y  [background:] involves: 129P2/OlaHsd * 129/Sv * C57BL/6 * FVB/N                      ECO:0000033     author statement supported by traceable reference       PMID:19103683   20210614        MGI

Please update your script and process the datasets: *_protein_disease_alliance_genome.csv

rykahsay commented 5 days ago

Please check accuracy of the created datasets:

$ ls -ltr unreviewed/*allian*
-rw-r--r--. 1 rykahsay glygen 2656964 Jun 26 15:00 unreviewed/mouse_protein_disease_alliance_genome.csv
-rw-r--r--. 1 rykahsay glygen 2811974 Jun 26 15:00 unreviewed/rat_protein_disease_alliance_genome.csv
-rw-r--r--. 1 rykahsay glygen 1753123 Jun 26 15:00 unreviewed/fruitfly_protein_disease_alliance_genome.csv
-rw-r--r--. 1 rykahsay glygen  646313 Jun 26 15:01 unreviewed/yeast_protein_disease_alliance_genome.csv
-rw-r--r--. 1 rykahsay glygen 2918094 Jun 26 15:01 unreviewed/human_protein_disease_alliance_genome.csv
katewarner commented 2 days ago

I checked *_protein_disease_alliance_genome.csv datasets and they all look accurate. There was an increase across the datasets, especially in the rat dataset, but Karina thinks it's due to the proteome update.

rykahsay commented 1 day ago

... and what do I need to do now? If nothing, you need to close the ticket