Closed GoogleCodeExporter closed 9 years ago
Could you please ask Ruth to look into this? Looks like many/most/all? are
from converting data from the PDB to BIND. Some of these are legitimate e.g.
32630 is synthetic construct, so it could be a human protein synthetically
created. However, the biosource name should make the taxonomy ID. Thanks, Gary
Original comment by gary.bad...@gmail.com
on 18 Nov 2014 at 5:31
Are all BIND files translated or only a subset or just the human file
(taxid9606_PSIMI25.xml)?
I found additional wrong organisms :0, 1260, 1280, 4896, 6431, 8355, 9598,
9615, 9913, 9986, 10090, 10116,10407, 11676, 12475
From the Human BIND file there are 653 records where the biosource is Homo
sapiens and the taxid is not 9606 or the biosource is not Homo sapiens and the
taxid is 9606. Of those 551 originate from PDB. In the BIND record the
biosource is specified as Homo sapiens, taxid 9606. When we updated the
identifiers taxids got updated to reflect the protein/gene identifier
associated with the interactor instead of using what was specified in the BIND
record. Unfortunately, the Taxon name did not get updated to reflect this
change.
Original comment by rr.weinb...@gmail.com
on 19 Nov 2014 at 4:32
Here, as the first message says, we're talking only about this file:
http://download.baderlab.org/BINDTranslation/release1_0/PSIMI25_XML/taxid9606_PS
IMI25.xml
(aye, good to know that the rest of BIND data have the same issue).
Ruth, would you please generate a new fixed file any soon, if possible? ;)
Original comment by rod...@gmail.com
on 19 Nov 2014 at 5:14
Ok, as a quick "fix", I updated the psimi-converter to use human organism
(BioSource object, taxonomy 9606, "Homo sapiens") with all those entities where
organism name was "Homo sapiens" (or "human") but taxonomy ID wasn't 9606.
I think it might work, because: a) the BIND data was claimed to be human data;
b) if an experimental form of a protein wasn't human, the experiment was about
to infer/prove a human PPI interaction (also the converter currently ignores
<experimentalInteractorList> element anyway); c) the protein/gene identifiers
in some cases actually belong either to human or multiple organisms
But e.g. POLR2A "genbank identifier" (NCBI GI) 12781 (though must be
"gi:12781"), "CAA43449" (is GenPept ID, though it's called "ensembl" there for
some reason) had organism with taxID:9770 and name "Homo sapiens" in the
PSI-MI, and is in fact not human...
Original comment by rod...@gmail.com
on 12 Feb 2015 at 8:58
Original comment by rod...@gmail.com
on 13 Feb 2015 at 3:33
Well, won't fix (actually I reverted the previous fix attempt, where taxonomy ids were replaced with 9606 if name was "Homo sapiens", because participant's protein/gene identifiers were in fact not human...). This must be investigated and fixed in the original BIND data.
Original issue reported on code.google.com by
rod...@gmail.com
on 17 Nov 2014 at 9:18