Bahler-Lab / yogy

A web-based resource for orthologous proteins of eukaryotic organisms.
0 stars 0 forks source link

Fission_yeast_annotation #2

Open sinanshi opened 8 years ago

dbitton commented 8 years ago

go to http://www.pombase.org/downloads/data-mappings

first file tab delimited file of systematic name, primary name (empty if not assigned), followed by all synonyms wget ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Mappings/allNames.tsv

tab delimited file of systematic identifier mapped to UniProt Accession Number wget ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Mappings/PomBase2UniProt.tsv

systematic name, primary name (empty if not assigned), synonyms (empty if not assigned), followed by gene product description wget ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Mappings/sysID2product.tsv

Create the following table

first read allnames.tsv guided by sytematic ID rather than names (5416...) then fill in the table accordingly, description (sysID2product.tsv),,,,, PubmedID (primary names allnames.tsv column 2), GeneDB (description sysID2product.tsv), uniprot (PomBase2UniProt.tsv, column)

dbitton commented 8 years ago

pompep wget ftp://ftp.ebi.ac.uk/pub/databases/pombase/FASTA/pep.fa.gz

sinanshi commented 8 years ago

Note, the number of rows should always be the same.

sinanshi commented 8 years ago

SGD_features.tab from http://downloads.yeastgenome.org/curation/chromosomal_feature/SGD_features.tab. non empty number of rows - 5416

ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Mappings/allNames.tsv non empty number of rows - 4508

Intersection between these two files -- 2021

sinanshi commented 8 years ago

Do you mean systematic_id by primary name?

sinanshi commented 8 years ago

modification:

guided by uniport id use primary name , while primary name doesn't exsit fill in systematic id

sinanshi commented 8 years ago

5139 rows

sinanshi commented 8 years ago

Neither primary name nor systematic name in all_name.tsv is not unique (2 duplication). Remove all the non-unique entry.

sinanshi commented 8 years ago

work done, please check later.

dbitton commented 8 years ago

great, all the update?

dbitton commented 8 years ago

I am aware of the two duplicates

sinanshi commented 8 years ago

No, only the Fission_yeast_annotation.txt. There are two duplicates in both primaray and systematic names.