Closed ubhuiyan closed 6 months ago
Many of the existing protein datasets (in all species) say "The dataset is derived from 2019-09 UniProt release." in their Usability Domain. I'm assuming this is outdated, but can you confirm? If so, should we remove this sentence since we won't be updating this section with every release?
Same issue as above for RefSeq...Usability Domains say 2019.
Update according to the following template:
"The {dataset name}
dataset contains {organism scientific name}
[taxid:{taxid}
] UniProtKB canonical accessions...."
And delete the sentence: "The dataset is derived from 2019-09 UniProtKB release."
GLY_000003✔ GLY_000007✔ GLY_000012✔ GLY_000013✔ GLY_000031✔ GLY_000032✔ GLY_000033✔ GLY_000035✔ GLY_000036✔ GLY_000053✔ GLY_000054✔ GLY_000081✔ GLY_000082✔ GLY_000087✔ GLY_000088✔ GLY_000090✔ GLY_000091✔ GLY_000093✔ GLY_000094✔ GLY_000095✔ GLY_000096✔ GLY_000097✔ GLY_000098✔ GLY_000099✔ GLY_000100✔ GLY_000101✔ GLY_000102✔ GLY_000103✔ GLY_000104✔ GLY_000105✔ GLY_000106✔ GLY_000107✔ GLY_000108✔ GLY_000109✔ GLY_000110✔ GLY_000112✔ GLY_000113✔ GLY_000114✔ GLY_000115✔ GLY_000116✔ GLY_000117✔ GLY_000118✔ GLY_000119✔ GLY_000120✔ GLY_000121✔ GLY_000122✔ GLY_000123✔ GLY_000124✔ GLY_000125✔ GLY_000126✔ GLY_000127✔ GLY_000128✔ GLY_000129✔ GLY_000131✔ GLY_000132✔ GLY_000135✔ GLY_000136✔ GLY_000222✔ GLY_000223✔ GLY_000228✔ GLY_000229✔ GLY_000232✔ GLY_000233✔ GLY_000234✔ GLY_000236✔ GLY_000241✔ GLY_000242✔ GLY_000243✔ GLY_000244✔ GLY_000245✔ GLY_000250✔ GLY_000252✔ GLY_000253✔ GLY_000254✔ GLY_000255✔ GLY_000257✔ GLY_000259✔ GLY_000260✔ GLY_000261✔ GLY_000262✔ GLY_000263✔ GLY_000266✔ GLY_000267✔ GLY_000270✔ GLY_000273✔ GLY_000274✔ GLY_000276✔ GLY_000278✔ GLY_000310✔ GLY_000313✔ GLY_000314✔ GLY_000315✔ GLY_000319✔ GLY_000320✔ GLY_000321✔ GLY_000329✔ GLY_000335✔ GLY_000348✔ GLY_000349✔ GLY_000350✔ GLY_000351✔ GLY_000352✔ GLY_000353✔ GLY_000354✔ GLY_000356✔ GLY_000357✔ GLY_000358✔ GLY_000359✔ GLY_000360✔ GLY_000361✔ GLY_000362✔ GLY_000368✔ GLY_000369✔ GLY_000370✔ GLY_000371 ✔ GLY_000372✔ GLY_000373✔ GLY_000374✔ GLY_000375✔ GLY_000376✔ GLY_000377✔ GLY_000378✔ GLY_000379✔ GLY_000380✔ GLY_000381✔ GLY_000382✔ GLY_000383✔ GLY_000384✔ GLY_000385✔ GLY_000386✔ GLY_000390✔ GLY_000391✔ GLY_000395✔ GLY_000396✔ GLY_000397✔ GLY_000398✔ GLY_000399✔ GLY_000400✔ GLY_000401✔ GLY_000457✔ GLY_000458✔ GLY_000464✔ GLY_000466✔ GLY_000468✔ GLY_000469✔ GLY_000523✔ GLY_000524✔ GLY_000530✔ GLY_000597✔ GLY_000598✔ GLY_000599✔ GLY_000640✔ GLY_000646✔ GLY_000742✔ GLY_000751✔ GLY_000759✔ GLY_000829✔ GLY_000830✔ GLY_000831✔ GLY_000835✔ GLY_000838✔ GLY_000840✔ GLY_000844✔ GLY_000846✔ GLY_000848✔ GLY_000856✔ GLY_000857✔ GLY_000858✔ GLY_000860✔ GLY_000862✔ GLY_000863✔ GLY_000864✔ GLY_000865✔ GLY_000867✔ GLY_000869✔ GLY_000870✔ GLY_000871✔ GLY_000872✔ GLY_000873✔ GLY_000874✔ GLY_000875✔ GLY_000876✔ GLY_000877✔ GLY_000878✔ GLY_000879✔ GLY_000880✔ GLY_000884✔ GLY_000894✔ GLY_000906✔ GLY_000907✔ GLY_000908✔ GLY_000909✔ GLY_000910✔ GLY_000911✔ GLY_000912✔ GLY_000914✔ GLY_000916✔ GLY_000917✔ GLY_000919✔ GLY_000937✔ GLY_000941✔ GLY_000942✔ GLY_000944✔ GLY_000945✔ GLY_000947✔ GLY_000948✔ GLY_000949✔ GLY_000951✔ GLY_000952✔
Update according to the following template:
"The {dataset name}
dataset contains {organism scientific name}
[taxid:{taxid}
] UniProtKB canonical accessions...."
And delete the sentence: "The dataset is derived from NCBI RefSeq Release 96, September 9, 2019." Also, if you see this sentence, delete it: "The RefSeq accessions are The dataset is derived from NCBI RefSeq Release 96, September 9, 2019"
GLY_000021✔ GLY_000022✔ GLY_000133✔ GLY_000134✔ GLY_000235✔ GLY_000249✔ GLY_000256✔ GLY_000264✔ GLY_000275✔ GLY_000387✔ GLY_000388✔ GLY_000389✔ GLY_000392✔ GLY_000393✔ GLY_000394✔ GLY_000405✔ GLY_000406✔ GLY_000407✔ GLY_000437✔ GLY_000554✔ GLY_000613✔ GLY_000614✔ GLY_000615✔ GLY_000645✔ GLY_000758✔ GLY_000839✔ GLY_000845✔ GLY_000852✔ GLY_000904✔ GLY_000913✔ GLY_000918✔ GLY_000932✔ GLY_000946✔
@kmartinez834
for HCV1a and HCV1b what scientific name should I use?
I know these are not scientific names but do these work: HCV1a - Hepatitis C virus (genotype 1a, isolate H) HCV1b - Hepatitis C virus (genotype 1b, isolate Japanese)
Yes, please use the "long_name" from the file generated/misc/species_info.csv
:
tax_id,short_name,long_name,common_name,nt_file,is_reference,sort_order
9606,human,Homo sapiens,Human,uniprot-proteome-homo-sapiens.nt,yes,1
10090,mouse,Mus musculus,Mouse,uniprot-proteome-mus-musculus.nt,yes,2
10116,rat,Rattus norvegicus,Rat,uniprot-proteome-rattus-norvegicus.nt,yes,3
63746,hcv1a,Hepatitis C virus (isolate tax_id,short_name,long_name,common_name,nt_file,is_reference,sort_order
9606,human,Homo sapiens,Human,uniprot-proteome-homo-sapiens.nt,yes,1
10090,mouse,Mus musculus,Mouse,uniprot-proteome-mus-musculus.nt,yes,2
10116,rat,Rattus norvegicus,Rat,uniprot-proteome-rattus-norvegicus.nt,yes,3
63746,hcv1a,Hepatitis C virus (isolate H),HCV-H,uniprot-proteome-hepatitis-c-virus-1a.nt,yes,4
11116,hcv1b,Hepatitis C virus (isolate Japanese),HCV-Japanese,uniprot-proteome-hepatitis-c-virus-1b.nt,yes,5
694009,sarscov1,Severe acute respiratory syndrome-related coronavirus,HCoV-SARS,uniprot-proteome-sars-coronavirus.nt,yes,6
2697049,sarscov2,Severe acute respiratory syndrome coronavirus 2,SARS-CoV-2,uniprot-proteome-sars-cov-2.nt,yes,7
7227,fruitfly,Drosophila melanogaster,Fruit fly,uniprot-proteome-drosophila-melanogaster.nt,yes,8
559292,yeast,Saccharomyces cerevisiae S288C,Yeast,uniprot-proteome-saccharomyces-cerevisiae.nt,yes,9
44689,dicty,Dictyostelium discoideum,Cellular slime molds,uniprot-proteome-dictyostelium-discoideum.nt,yes,10
9823,pig,Sus scrofa,Pig,uniprot-proteome-sus_scrofa.nt,yes,11H),HCV-H,uniprot-proteome-hepatitis-c-virus-1a.nt,yes,4
11116,hcv1b,Hepatitis C virus (isolate Japanese),HCV-Japanese,uniprot-proteome-hepatitis-c-virus-1b.nt,yes,5
694009,sarscov1,Severe acute respiratory syndrome-related coronavirus,HCoV-SARS,uniprot-proteome-sars-coronavirus.nt,yes,6
2697049,sarscov2,Severe acute respiratory syndrome coronavirus 2,SARS-CoV-2,uniprot-proteome-sars-cov-2.nt,yes,7
7227,fruitfly,Drosophila melanogaster,Fruit fly,uniprot-proteome-drosophila-melanogaster.nt,yes,8
559292,yeast,Saccharomyces cerevisiae S288C,Yeast,uniprot-proteome-saccharomyces-cerevisiae.nt,yes,9
44689,dicty,Dictyostelium discoideum,Cellular slime molds,uniprot-proteome-dictyostelium-discoideum.nt,yes,10
9823,pig,Sus scrofa,Pig,uniprot-proteome-sus_scrofa.nt,yes,11
Task completed @kmartinez834 please close the ticket
The usability domain for the pig data either contains outdated or no information and needs an update.