Open sinanshi opened 8 years ago
pompep wget ftp://ftp.ebi.ac.uk/pub/databases/pombase/FASTA/pep.fa.gz
Note, the number of rows should always be the same.
SGD_features.tab from http://downloads.yeastgenome.org/curation/chromosomal_feature/SGD_features.tab
. non empty number of rows - 5416
ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Mappings/allNames.tsv
non empty number of rows - 4508
Intersection between these two files -- 2021
PubMedID (systematic id)
sysID2product.tsv systematic name, primary name (empty if not assigned), synonyms (empty if not assigned), followed by gene product description
Do you mean systematic_id by primary name?
guided by uniport id use primary name , while primary name doesn't exsit fill in systematic id
5139 rows
Neither primary name nor systematic name in all_name.tsv is not unique (2 duplication). Remove all the non-unique entry.
work done, please check later.
great, all the update?
I am aware of the two duplicates
No, only the Fission_yeast_annotation.txt
. There are two duplicates in both primaray and systematic names.
go to http://www.pombase.org/downloads/data-mappings
first file tab delimited file of systematic name, primary name (empty if not assigned), followed by all synonyms
wget ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Mappings/allNames.tsv
tab delimited file of systematic identifier mapped to UniProt Accession Number
wget ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Mappings/PomBase2UniProt.tsv
systematic name, primary name (empty if not assigned), synonyms (empty if not assigned), followed by gene product description
wget ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Mappings/sysID2product.tsv
Create the following table
first read allnames.tsv guided by sytematic ID rather than names (5416...) then fill in the table accordingly, description (sysID2product.tsv),,,,, PubmedID (primary names allnames.tsv column 2), GeneDB (description sysID2product.tsv), uniprot (PomBase2UniProt.tsv, column)