JULIELab / gepi

GePI (GEne - Protein Interactions) is a web portal for quick and convenient access to gene - protein interaction mentions automatically extracted from the biomedical literature, i.e. PubMed and PubMed Central (Open Access Subset).
GNU General Public License v3.0
1 stars 0 forks source link

Search bug with different entrez IDs ?!? #67

Open SchSascha opened 7 years ago

SchSascha commented 7 years ago

Situation: A search of m entrez IDs and a separate search of n entrez IDs with m being a real subset of n yields different results, though the same atids are reported.

SchSascha commented 7 years ago

I tested the following with AhR, entrez id: 196, neo4j top homology atid: atid88758 Our neo4j has the following entrez ids attached to atid88758:

475251
11622
554265
714254
25690
463278
280714
196
246224
443467

Observation 1: Searching with 196 only or the list above yields exactly the same output.

SchSascha commented 7 years ago

Using Uniprot to search "ahr" as gene name and using Uniprot's mapping service to derive respective entrez ids yields the following list of ids:

100008995
100049228
100136384
100136407
100136431
100136911
100136913
100136915
100230166
100335047
100410839
100433770
100682538
101708722
101820002
101840826
101916811
101976457
102354023
102378764
102432278
102576203
103123809
104051548
104053525
105917025
105937229
105937232
105991401
106591494
106737230
11622
172788
196
246224
25690
280714
30517
31349770
3641733
373907
396654
398659
41988
443467
463278
475251
554265
778536
8588184
948802
9804806

Observation 2: A gepi search with "196" or with the above input yields results, where both results include unique results not known to the other result list. Of note these results ARE NOT necessarily corresponding to entrez ids unique to the input lists (or entrez ids known to gepi by its ahr top homology atid 3.)). Example (§ is delimiter):"AhR target genes§196§AHR§Arnt§11863§ARNT§§PMC2948512§Activation of AhR target genes by leflunomide requires Arnt. (A) Induction of the AhR target genes CYP1A1, UGT1A1, and NQO1 was performed as in figure 2."

Observation 3: the only top homology atid associated to the above entrez ids is atid88758 (query: match(n:ID_MAP_NCBI_GENES)<-[:IS_BROADER_THAN*2]-(m) where n.originalId IN ["100008995", "100049228", "100136384", "100136407", "100136431", "100136911", "100136913", "100136915", "100230166", "100335047", "100410839", "100433770", "100682538", "101708722", "101820002", "101840826", "101916811", "101976457", "102354023", "102378764", "102432278", "102576203", "103123809", "104051548", "104053525", "105917025", "105937229", "105937232", "105991401", "106591494", "106737230", "11622", "172788", "196", "246224", "25690", "280714", "30517", "31349770", "3641733", "373907", "396654", "398659", "41988", "443467", "463278", "475251", "554265", "778536", "8588184", "948802", "9804806"] return m )

SchSascha commented 7 years ago

Which entrez ids of the uniprot derived list are in our neo4j?

443467
196
172788
25690
41988
11622
475251
463278
554265
280714
398659
100335047
246224
30517

in neo4j not known to atid88758 are "172788", "41988", "398659", "100335047", "30517" some information:

Observation 4: 172788 and 41988 share one homologogene node, which itself is not connected to a top-homology labeled node; all other ids do not have an "IS_BROADER_THAN" relationship, thus for them there is no homologogene, genegroup and consequently, no top-homology node Observation 5: results unique to 2.) are sensible in most cases

SchSascha commented 7 years ago

Observation 6: the uniprot expanded input list provides results that miss 7 hits that are known to gepi when searched via AhR atid only. These are:

Ahr§196§AHR§Apaf1§317§APAF1§§PMC4574265§In mammalian cells, the expression of Apaf1 was shown to be inhibited by overexpression of Ahr through a complex formed by AhR and E2f1 which binds to the Apaf1 promoter at a region containing E2f1 binding site, but no AhR binding sites [94].
AhR§196§AHR§Notch1§767866§NOTCH1§§PMC4999520§For instance, AhR was shown to induce components of the Notch pathway such as Notch1, Notch2 and Hes in innate lymphoid cells, thus placing AhR upstream of Notch39, whereas a study in T cells47 proposed synergy between Notch and AhR for interleukin-22 induction.
AhR§196§AHR§Sox9b§60642§tid527335§§PMC4574265§In zebrafish, down-regulation of the Sox9 isoform (Sox9b) does not occur immediately after induction of AhR pathway which might suggest an indirect mechanism for AhR regulation of Sox9b [87].
AHR1B§554265§AHR§CYP1A§140634§Cyp1a1§§PMC3252317§Our immunohistochemical results with the ahr2hu3335 line suggest that mild vascular expression of CYP1A is induced via AHR1B, and can be effectively knocked down to background with morpholino injection.
AhR/ARNT heterodimer§196§AHR§ERα§2099§ESR1§§PMC4679188§Successively, the AhR repressor (AhRR) heterodimerizes with ARNT to terminate the activation of the AhR signaling pathway.77 BPA treatment in utero (0.02-20 000 µg/kg/d) upregulated the expression of AhRR, impairing the AhR expression and function in embryos.78 Notably, BPA did not dramatically alter genes in the AhR signaling pathway in adults.79 Remarkably, it has been showed that the agonist-activated AhR/ARNT heterodimer directly associates with ERα and ERβ.
AhR/ARNT heterodimer§196§AHR§ERβ§2099§ESR1§§PMC4679188§Successively, the AhR repressor (AhRR) heterodimerizes with ARNT to terminate the activation of the AhR signaling pathway.77 BPA treatment in utero (0.02-20 000 µg/kg/d) upregulated the expression of AhRR, impairing the AhR expression and function in embryos.78 Notably, BPA did not dramatically alter genes in the AhR signaling pathway in adults.79 Remarkably, it has been showed that the agonist-activated AhR/ARNT heterodimer directly associates with ERα and ERβ.
AhR target genes§196§AHR§Arnt§11863§ARNT§§PMC2948512§Activation of AhR target genes by leflunomide requires Arnt. (A) Induction of the AhR target genes CYP1A1, UGT1A1, and NQO1 was performed as in figure 2.

As the respective AhR Id "196" is known to both lists, the question remains, why these are not listed when searching with the unprot extended input.

SchSascha commented 6 years ago

@khituras This is what we talked over. You mentioned that there might always be an issue with the "manual uniprot ID extension approach" in comparison to our pipeline. Honestly, as long as we can do as best as possible without such manual work it is fine for me (a user can expand himself anyway). Hence, I wonder whether we can close this issue and regard it rather as a feature.