biolink / ontobio

python library for working with ontologies and ontology associations
https://ontobio.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
119 stars 30 forks source link

Moving biolink scigraph to ontobio, owlsim improvements #588

Closed kshefchek closed 3 years ago

kshefchek commented 3 years ago

Originally my goal was to work on https://github.com/biolink/biolink-api/issues/380, I'm expanding this PR to completely decouple any logic dealing with backend services from the biolink-api codebase, specifically:

EDIT: going to come back to tasks 3 and 4 in another PR

dustine32 commented 3 years ago

@kshefchek Crazy. These test errors are due to an upstream mapping change by @nsuvarnaiari in evidenceontology/evidenceontology/gaf-eco-mapping.txt (see this diff) that changed the old default mapping IEA-> ECO:0000501 to IEA -> ECO:0007669. The file is consumed here: https://github.com/biolink/ontobio/blob/d703b9110b6660049c26dc4edb1278ce656ee216/ontobio/ecomap.py#L29 Unfortunately, the currently failing tests deal with IEAs and ECO:0000501 is hard-coded: https://github.com/biolink/ontobio/blob/d703b9110b6660049c26dc4edb1278ce656ee216/ontobio/io/qc.py#L241 https://github.com/biolink/ontobio/blob/d703b9110b6660049c26dc4edb1278ce656ee216/ontobio/io/qc.py#L269 https://github.com/biolink/ontobio/blob/d703b9110b6660049c26dc4edb1278ce656ee216/ontobio/io/qc.py#L447

Not certain of the best way to fix this, so @cmungall @kltm @nsuvarnaiari, I'm guessing this upstream ECO mapping change will affect IEAs in the release pipeline. Should this be addressed upstream (revert the gaf-eco-mapping.txt change) or do we adjust our ontobio code to expect IEA -> ECO:0007669?

nsuvarnaiari commented 3 years ago

Hi @dustine32

Yes, we are now mapping IEA to ECO:0007669 in our gaf-eco-mapping.txt. I cannot tell what is the right thing to do at your end, but adjusting ontobio code to expect IEA ->ECO:0007669 seems the correct way.

Thanks, Suvvi

mgiglio99 commented 3 years ago

Hello, Sorry this change has caused problems for your system. The recent change of mapping of GO evidence code IEA to ECO:0007669 came about due to concerns raised by GO regarding the scope of what was included under ECO:0000501. Please see ECO issue https://github.com/evidenceontology/evidenceontology/issues/251 for details on that.

Regarding the lines of code above, we suggest: for line 447, you change it to iea = "ECO:0007669", for line 269, we suggest adding ECO:0007669 to the set of non-experimental terms, but leaving ECO:0000501 there for now, since it is likely many people are still using it for line 241, we are not sure if just changing to ECO:0007669 is sufficient since we don't know what's in self.do_not_manually_annotate and just saying that a term is not ECO:0007669 is not enough to say that it's manually curated.

Happy to discuss more as needed to get this sorted out. Michelle

dustine32 commented 3 years ago

Thank you @mgiglio99 and @nsuvarnaiari for the explanation! Right, we'll adjust our code to expect the new IEA to ECO:0007669 mapping, i.e. ensure ontobio code always pulls the IEA->ECO code mapping from your gaf-eco-mapping.txt (via the PURL).

@kshefchek I can make these changes in a separate PR (this should fix the tests too), merge to master, double-check this PR's tests run fine and then I'll approve. Thanks for your patience!

kshefchek commented 3 years ago

thanks @nsuvarnaiari and @mgiglio99! :) @dustine32 that plan sounds good to me, I'll wait for https://github.com/biolink/ontobio/pull/590 to be merged and then pull in those changes.