geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
34 stars 10 forks source link

Produce acceptable taxon pair list for IntAct (remove i.e. IntAct IPI annotation to P03772) #1339

Open jimhu-tamu opened 8 years ago

jimhu-tamu commented 8 years ago

http://www.ebi.ac.uk/QuickGO/GProtein?ac=P03772

The imported IntAct annotation is showing interaction between a bacteriphage protein (the annotation object) and a mouse protein (UniProtKB:Q64702 the with/from). This is not biologically relevant to the function of the phage protein and seems like something that might be catchable by taxon constraints.

bmeldal commented 6 years ago

The interaction is bona fide: bacteriophage lambda phosphatase dephosphorylates mouse Sak in an in vitro expt to test for Sak autophosphorylation activity. Whether you want this type of interaction in GO is your decision. PMID:17174311, Fig 1A http://onlinelibrary.wiley.com/doi/10.1016/j.febslet.2006.11.080/full https://www.ebi.ac.uk/intact/interaction/EBI-7181975

pgaudet commented 6 years ago

Hi @bmeldal I dont think this is correct according to the GO guidelines. You can say that bacteriophage lambda phosphatase has a phosphatase activity based on this assay;however what you are capturing here is the assay, not the physiologically relevant substrate. This interaction should not be in GO.

Thanks, Pascale

bmeldal commented 6 years ago

Pascal, we curate whatever people do in the lab, whether it makes physiological sense of not. Do you know if you filter our GAF or do we have to change what we export? I've never been involved with the IntAct GAF, that was put in place long before I joined and hasn't changed for a long time.

@sandraorchard will know more

sandraorchard commented 6 years ago

There is a lot of mixed species data generated in the PPI field and it would be very difficult to design filters which differentiated 'real' host-pathogen responses and not lab-induced. Bacteriophage-non-bacterial host I agree we could filter out, and we can look at ways of doing this in the export pipeline.

pgaudet commented 6 years ago

Right - you could probably get a combination of phage/viral species + host species and sort most of them out this way, is this what you mean ?

Thanks, Pascale

bmeldal commented 6 years ago

I don't think it's an easy fix. It's perfectly possibly that a bona fide host-pathogen interaction has been shown in vitro - then there's not much we can deduce from the experimental conditions whether it's physiologically 'real' or not. I wonder if we need to add the "no uniprot export" tag to such interactions? @pporrasebi Pablo, what do you think?

pporrasebi commented 6 years ago

I agree with Sandra. In order to filter out what GO regards as non-physiological (weird experimental things such as human-yeast or mouse-bacteriophage interactions) one can use a pre-defined list of host-pathogen taxID pairs and create the appropriate filters for those. If GO can provide a list of the type of inter-species interactions they would find acceptable, we can certainly figure out a way to put the filters in the export pipeline.

bmeldal commented 6 years ago

@pporrasebi please advise on action - if you raise a Jira ticket for IntAct and link it to here I can close this ticket.

pporrasebi commented 6 years ago

Actions to be taken: 1.- GO to produce a list of taxID pairs they would consider acceptable 2.- IntAct to implement the list as a filter in the GO IPI export

Anything else?

bmeldal commented 6 years ago

@pgaudet who would have to produce the list for us to tighten our export?

pgaudet commented 6 years ago

Hi @bmeldal The clean solution would be to tparse the host information in the UniProt entry - for example for http://www.uniprot.org/uniprot/P27958#names_and_taxonomy they have this: image

Can this be done ?

Thanks, Pascale

sandraorchard commented 6 years ago

Hi Pascale We have the taxonomy info, and the hierarchy. The issue is what 'pairing' is OK and what is not. For example mammal-virus OK mammal-bacteria OK bacteria-bacteriophage OK bacteriophage-mammal filter out

sandraorchard commented 6 years ago

I can write a fuller list tomorrow to start the discussion, if that helps.

pporrasebi commented 6 years ago

I think Pascale refers to the host information contained on the viral protein entry in UniProt. There we could potentially get a more accurate mapping of host-pathogen pairings. Whether it is worth integrating a parser for this kind of information in our export pipeline is another question.

For my point of view, Sandra's approach would probably give more or less the same results, not sure if we need the specificity Pascale's approach brings in.

sandraorchard commented 6 years ago

Yes, sorry, didn't read it properly. I was thinking more generic filters would also allow us to get rid of all those saccharomyces-human interactions (we'd have to set that one lower than just fungi, as we'd otherwise loose genuine host-pathogen interactions) and assorted fly/worm-irrelevant species which I suspect are also in there. Should also be extended to the CC line export.

pgaudet commented 6 years ago

Hi @sandraorchard @pporrasebi The problem with the higher level mapping that Sandra suggests is that for example HIV proteins could interact with mouse proteins in vitro, but never in vivo since it's not a host. Depending on the types of experiments you capture they may be a high level of false positives with that approach.

Pascale

pgaudet commented 5 years ago

Hi @sandraorchard @bmeldal @pporrasebi Any further thoughts on this ?

pporrasebi commented 5 years ago

Hi @pgaudet. We need the host-pathogen pairings from GO or UniProt. Once we get this, we can show you how to create the filters needed to select which interspecies interactions to discard and decide together where the filters need to be implemented.

sandraorchard commented 5 years ago

For viral proteins, there is a line (OH) in the flat file describing the host , when known. ID POLG_SVSAP Reviewed; 2280 AA. AC Q672I1;

OS Sapporo virus (isolate GI/Human/Germany/pJG-Sap01) OS (Hu/Dresden/pJG-Sap01/DE). OC Viruses; ssRNA viruses; ssRNA positive-strand viruses, no DNA stage; OC Caliciviridae; Sapovirus. OX NCBI_TaxID=291175; OH NCBI_TaxID=9606; Homo sapiens (Human).

I don't think this exists for anything other than viruses but I can find out

pgaudet commented 5 years ago

Sounds good. I'll keep this open. Thanks ! Pascale

jimhu-tamu commented 5 years ago

Sounds like we might want to add this to the things the multi-organism group is discussing. What do you think @pgaudet?

pgaudet commented 5 years ago

Sure ! I've added this to next week's agenda.