biolink / ontobio

python library for working with ontologies and ontology associations
https://ontobio.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
118 stars 30 forks source link

Unknown GAF qualifier/relation breaks parser #591

Closed dustine32 closed 2 years ago

dustine32 commented 2 years ago

The GafParser.to_association() function is failing while attempting to parse this line:

UniProtKB       P0AFD6  nuoI    Contributes_to  GO:0003954      PMID:3122832    IDA             F                       gene    taxon:83333     20080722        EcoliWiki

Stack trace:

  File "/Users/ebertdu/go/go-site/pipeline/env/lib/python3.6/site-packages/ontobio/io/assocparser.py", line 521, in association_generator
    parsed_result = self.parse_line(line)
  File "/Users/ebertdu/go/go-site/pipeline/env/lib/python3.6/site-packages/ontobio/io/gafparser.py", line 181, in parse_line
    parsed = to_association(list(vals), report=self.report, qualifier_parser=self.qualifier_parser(), bio_entities=self.bio_entities)
  File "/Users/ebertdu/go/go-site/pipeline/env/lib/python3.6/site-packages/ontobio/io/gafparser.py", line 397, in to_association
    qualifiers = [association.Curie.from_str(curie_util.contract_uri(relations.lookup_label(q), strict=False)[0]) for q in qualifiers]
  File "/Users/ebertdu/go/go-site/pipeline/env/lib/python3.6/site-packages/ontobio/io/gafparser.py", line 397, in <listcomp>
    qualifiers = [association.Curie.from_str(curie_util.contract_uri(relations.lookup_label(q), strict=False)[0]) for q in qualifiers]
  File "/Users/ebertdu/go/go-site/pipeline/env/lib/python3.6/site-packages/prefixcommons/curie_util.py", line 113, in contract_uri
    if (uri.startswith(v)):
AttributeError: 'NoneType' object has no attribute 'startswith'

Failing line: https://github.com/biolink/ontobio/blob/81ed5482e16f5e4a296cf123655ebf434142b1bf/ontobio/io/gafparser.py#L397

For the Contributes_to case above, the next few lines are already setup to catch and report it, but the code dies before it can. https://github.com/biolink/ontobio/blob/81ed5482e16f5e4a296cf123655ebf434142b1bf/ontobio/io/gafparser.py#L404-L407 I believe simply moving this code above the list comprehension line should fix this for us.

kltm commented 2 years ago

@dustine32 As we no longer have this testable upstream and you have tests and the code is live, I'm just going to call this closed for now--please reopen if I'm mistaken.