biolink / ontobio

python library for working with ontologies and ontology associations
https://ontobio.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
118 stars 30 forks source link

Obsolete GO term with no replacement breaks with/from validation #602

Closed dustine32 closed 2 years ago

dustine32 commented 2 years ago

@sierra-moxon I noticed this error in _unroll_withfrom_and_replair_obsoletes when processing a line containing GO:0004871 in the with/from column:

MGI:MGI:2448177         RO:0001025      GO:0005887      MGI:MGI:5467169|PMID:22859307   ECO:0000305     GO:0004871              2013-10-29      MGI             creation-date=2013-10-29|modification-date=2013-10-29|contributor-id=https://orcid.org/0000-0001-7476-6306

The error:

Traceback (most recent call last):
  File "/Users/ebertdu/go/ontobio/env/bin/ontobio-parse-assocs.py", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/Users/ebertdu/go/ontobio/bin/ontobio-parse-assocs.py", line 244, in <module>
    main()
  File "/Users/ebertdu/go/ontobio/bin/ontobio-parse-assocs.py", line 175, in main
    func(ont, args.file, outfh, p, args)
  File "/Users/ebertdu/go/ontobio/bin/ontobio-parse-assocs.py", line 201, in validate_assocs
    for assoc in associations:
  File "/Users/ebertdu/go/ontobio/ontobio/io/assocparser.py", line 521, in association_generator
    parsed_result = self.parse_line(line)
  File "/Users/ebertdu/go/ontobio/ontobio/io/gpadparser.py", line 161, in parse_line
    assoc.evidence.with_support_from = self._unroll_withfrom_and_replair_obsoletes(split_line, 'gpad')
  File "/Users/ebertdu/go/ontobio/ontobio/io/assocparser.py", line 694, in _unroll_withfrom_and_replair_obsoletes
    return association.ConjunctiveSet.str_to_conjunctions(regrouped_fixed_elements)
  File "/Users/ebertdu/go/ontobio/ontobio/model/association.py", line 330, in str_to_conjunctions
    for conj in filter(None, entity.split("|")):
AttributeError: 'NoneType' object has no attribute 'split'

Looks like regrouped_fixed_elements is None when passed into str_to_conjunctions: https://github.com/biolink/ontobio/blob/7ba4e23e55beecf420eb03bdd55e1f0819f5c847/ontobio/io/assocparser.py#L694 I traced this to obsolete term GO:0004871, which has no replacement term assigned and so _validate_ontology_class_id returns None.

Cmd to reproduce with above GPAD line (tests/resources/obsolete_no_replacement.gpad). Note that -r must be set with an ontology file otherwise the check will always pass:

ontobio-parse-assocs.py --file obsolete_no_replacement.gpad --format GPAD -l all -r go.json validate

My short-term fix for this, which may actually be THE fix, is to simply add a check for None:

if regrouped_fixed_elements:
    return association.ConjunctiveSet.str_to_conjunctions(regrouped_fixed_elements)
else:
    return []

@sierra-moxon I can PR this fix in a bit and let you take a look, make any changes. Just wondering if you think this makes sense.

dustine32 commented 2 years ago

Had another instance of this occur during the 2022-02-20 snapshot with this line in fb.gaf:

FB      FBgn0003334     Scm     located_in      GO:0005634      FB:FBrf0179383|PMID:15280237    IC      GO:0016458      C       Sex comb on midleg      CG9495|SCM|Sex Comb on Midleg|Sex Comb on the Midleg|Sex combs on midleg|Sex combs on midlegs|Su(z)302|l(3)85Ef|scm|sex comb on midleg  protein taxon:7227      20050203        UniProt

With/from value GO:0016458 is obsolete w/o a replaced_by term (though it has two "consider" terms, I don't think ontobio treats these like replaced_by).

Testing with the #603 code change gets around this error, just printing blank for the line's with/from value along with a warning:

* WARNING - Obsolete class with no replacement: Violates GORULE:0000020 (GO:0016458)

@kltm @sierra-moxon I'm thinking we just go ahead and merge #603, then release and update go-site reqs.txt?

kltm commented 2 years ago

I believe this is now fixed.