Closed nataled closed 4 years ago
I think this will work. We show people "Homo sapiens [9606]” and do the mapping behind the scene.
Homo sapiens [9606] PR:000029067
On Jul 17, 2020, at 3:29 PM, Darren A. Natale notifications@github.com wrote:
We need to have a list of PRO terms that represent "all proteins in this organism". Such a list actually already exists and is provided with each release, but at the moment is incomplete. I will work on creating the list at release time (will be easy to do), but I wanted to find out what format to use. The existing file (taxbased_protein.dat) is just a single-column list of PRO terms, like:
PR:000000001 PR:000018263 PR:000029053 PR:000029031 PR:000029043 PR:000029065 PR:000029060 PR:000036194 A few questions:
@jz26 https://github.com/jz26 I recall this file was needed by you, or maybe it was @hongzhanhuang https://github.com/hongzhanhuang, for web site purposes. I'm not sure if that's still the case. Do you recall? Well, more to the point, is it still needed?
@chumingc https://github.com/chumingc I didn't quite catch what you said regarding what information you needed. Was it a mapping between the taxon ID and the PRO ID for such terms? And I think the name of...? Would this work:
NCBITaxon:9606PR:000029067WHICHEVER NAME YOU NEEDED
On the chance that taxbased_protein.dat is still needed, would it be best to modify that file according to the needs above, or to create a new file? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PROconsortium/PRoteinOntology/issues/192, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7IWF54G34P3RD64SKNS4LR4CRAXANCNFSM4O6YOHAA.
On Fri, Jul 17, 2020 at 3:29 PM Darren A. Natale notifications@github.com wrote:
We need to have a list of PRO terms that represent "all proteins in this organism". Such a list actually already exists and is provided with each release, but at the moment is incomplete. I will work on creating the list at release time (will be easy to do), but I wanted to find out what format to use. The existing file (taxbased_protein.dat) is just a single-column list of PRO terms, like:
PR:000000001 PR:000018263 PR:000029053 PR:000029031 PR:000029043 PR:000029065 PR:000029060 PR:000036194
A few questions:
1.
@jz26 https://github.com/jz26 I recall this file was needed by you, or maybe it was @hongzhanhuang https://github.com/hongzhanhuang, for web site purposes. I'm not sure if that's still the case. Do you recall? Well, more to the point, is it still needed?
I don't recall. Do you have more clues?
1. 2.
@chumingc https://github.com/chumingc I didn't quite catch what you said regarding what information you needed. Was it a mapping between the taxon ID and the PRO ID for such terms? And I think the name of...? Would this work:
NCBITaxon:9606PR:000029067WHICHEVER NAME YOU NEEDED
- On the chance that taxbased_protein.dat is still needed, would it be best to modify that file according to the needs above, or to create a new file?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PROconsortium/PRoteinOntology/issues/192, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMXZIL3DDOGN5NGZNTMVWPTR4CRAXANCNFSM4O6YOHAA .
@jz26 I believe it might have been used to suppress showing a giant list of children for the listed PRO terms.
Based on offline conversations, it has been confirmed that the existing file listing the terms of interest can be adapted to another use. I will let you know when it is ready.
@chumingc the format you suggested will not be easily parseable in all cases, as some taxon names have square brackets within. Instead, I'll produce the following:
taxon_name
You can find an example file at /home/dnatale/data/ontologies/for_release/taxbased_protein.dat on Hershey. Let me know if that's suitable. I can easily revise.
In the future, the file will be found at /data/pir/projects/pro/releaseNUM where NUM is the release number (next one being 61).
We need to have a list of PRO terms that represent "all proteins in this organism". Such a list actually already exists and is provided with each release, but at the moment is incomplete. I will work on creating the list at release time (will be easy to do), but I wanted to find out what format to use. The existing file (taxbased_protein.dat) is just a single-column list of PRO terms, like:
A few questions: 1) @jz26 I recall this file was needed by you, or maybe it was @hongzhanhuang, for web site purposes. I'm not sure if that's still the case. Do you recall? Well, more to the point, is it still needed?
2) @chumingc I didn't quite catch what you said regarding what information you needed. Was it a mapping between the taxon ID and the PRO ID for such terms? And I think the name of...? Would this work:
NCBITaxon:9606PR:000029067WHICHEVER NAME YOU NEEDED
3) On the chance that taxbased_protein.dat is still needed, would it be best to modify that file according to the needs above, or to create a new file?