geneontology / neo

noctua entity ontology
9 stars 2 forks source link

Include Human Protein complexes in NEO #76

Closed pgaudet closed 2 years ago

pgaudet commented 2 years ago

The file is here:

http://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/goa_human_complex.gpa.gz

@kltm please let me know if you need more information.

Thanks, Pascale

kltm commented 2 years ago

Noting that this is a ~6k line compressed GPAD 1.1.

kltm commented 2 years ago

goa_human_complex seems to already be included? https://github.com/geneontology/neo/blob/10210c1e07218f74fa02b86237e672001bffc7de/Makefile#L9 with "source": "ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/goa_human_complex.gpi.gz",

@pgaudet Is there something that you should be seeing that's missing?

pgaudet commented 2 years ago

Thanks, it looks like an autocomplete issue.

If I enter CPX-5993 I dont get anything relevant If I enter CPX-5993 hsap it works.

image

Is this an otherwise known issue?

@sylvainpoux

Thanks, Pascale

kltm commented 2 years ago

@pgaudet Hm. The entry seems to work as expected on the quick search on http://noctua-amigo.berkeleybop.org (and the entry is clearly there), so it not a NEO issue, but rather a client/autocomplete issue.

I think the issue may be that the label is getting drowned out in the ontology search as it's just a local ID (technically https://github.com/geneontology/amigo/issues/120). Adding the "Hsap" probably boosts up the label enough to get the correct hit. The general search doesn't suffer from this as it also indexes the local ids. Entering the full ID in the ontology search would generally bypass this (e.g. ComplexPortal:CPX-5993).

sylvainpoux commented 2 years ago

Hi,

yes, ComplexPortal complexes can be loaded. Although the autocomplete system is not very convenient, it works. Many thanks

I would have another request.

I made my model (http://noctua.geneontology.org/editor/graph/gomodel:61f34dd300000000). But then realized that it does not pass the Reasoner step. I suspect it is due to the ComplexPortal identifiers.

Do you have any explanation?

Thanks

Sylvain

vanaukenk commented 2 years ago

@kltm How are the goa_human_complex entries from the gpi file above typed in neo and minerva? If they aren't recognized as an instance of GO:0032991 might that explain the ShEx error that @sylvainpoux is seeing?

kltm commented 2 years ago

@vanaukenk @sylvainpoux For questions about the typing (which occurs in minerva, not NEO), I would probably rope in @balhoff .

balhoff commented 2 years ago

There is a NEO connection in that if the GPI containing the terms is not included into NEO, Minerva won't know how to connect them up to the root types.

vanaukenk commented 2 years ago

In the incoming gpi1.2 file, these ComplexPortal entries are typed with the text 'protein_complex'. I don't know how that information gets passed along during the NEO build and subsequent inner workings of minerva.

balhoff commented 2 years ago

I think I see the issue, which is another manifestation of #66 and ultimately #17.

The term in the Noctua model has IRI https://www.ebi.ac.uk/complexportal/complex/CPX-5993

In NEO it has IRI http://purl.obolibrary.org/obo/ComplexPortal_CPX-5993

I'll try adding this namespace similarly to the other fix. @vanaukenk in NEO the term is under information biomacromolecule. Should that work (assuming the identifier issue is fixed)?

vanaukenk commented 2 years ago

@balhoff - technically, these entries should be typed as 'protein-containing complex' (GO:0032991) in NEO.

balhoff commented 2 years ago

@vanaukenk @kltm goa_human_complex.gpi.gz is using object type protein_complex, but the code that maps to GO:0032991 is looking for complex: https://github.com/geneontology/neo/blob/aeb90f98e2517a43fd5440729a1c51944b26497b/gpi2obo.pl#L122

Who is right? Currently GO:0032991 doesn't appear in NEO at all.

vanaukenk commented 2 years ago

Oh, goodness. I'm referring to the ShEx and our GPI2.0 specs.

image

In ShEx we define ProteinContainingComplex as GO:0032991

image

balhoff commented 2 years ago

GPI 1.2 spec says it should be Type_Symbol but doesn't define that. Since nothing appears to be matching complex right now, I'm inclined to change the code to match protein_complex.

vanaukenk commented 2 years ago

SGD also includes ComplexPortal ids in their files. I'm double-checking to see what they use for type.

vanaukenk commented 2 years ago

I see protein_complex as well, for them.

balhoff commented 2 years ago

Thanks—I made a PR: https://github.com/geneontology/neo/pull/81

kltm commented 2 years ago

Merged PR; will need to wait to eyeball products once we clear https://github.com/geneontology/neo/issues/80

kltm commented 2 years ago

Continuing build attempts by temporarily shelving #77 and #80

kltm commented 2 years ago

@balhoff @vanaukenk Noting that we've had a NEO cycle with the new code in place.

kltm commented 2 years ago

Talking to @pgaudet , this is closed