bionlplab / bioc

Data structures and code to read/write BioC XML and Json.
MIT License
29 stars 11 forks source link

Brat parser: AssertionError: Illegal format: M #17

Closed sg-wbi closed 2 years ago

sg-wbi commented 2 years ago

First of all, thank you for this incredibly useful library.

I am trying to parse a brat file of this resource but I get the following error:

[ins] In [26]: a2 = brat.load_ann("devel/PMC-3333881-05-MATERIALS_AND_METHODS.a1")
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Input In [26], in <module>
----> 1 a2 = brat.load_ann("devel/PMC-3333881-05-MATERIALS_AND_METHODS.a1")

File ~/.venv/bigbio/lib/python3.9/site-packages/bioc/brat/decoder.py:214, in load_ann(fp, docid)
    212     doc.add_annotation(loads_brat_note(line))
    213 if line[0] == 'A' or line[0] == 'M':
--> 214     doc.add_annotation(loads_brat_attribute(line))
    215 if line[0] == '*':
    216     doc.add_annotation(loads_brat_equiv(line))

File ~/.venv/bigbio/lib/python3.9/site-packages/bioc/brat/decoder.py:16, in loads_brat_attribute(s)
     12 """
     13 ID [tab] TYPE REFID [FLAG1 FLAG2 ...]
     14 """
     15 toks = s.split('\t')
---> 16 assert len(toks) == 2, 'Illegal format: %s' % s
     18 att = BratAttribute()
     19 att.id = toks[0]

AssertionError: Illegal format: M
yfpeng commented 2 years ago

Are you sure the file is "PMC-3333881-05-MATERIALS_AND_METHODS.a1"?

sg-wbi commented 2 years ago

By bad, brat.load_ann expects a TextIO not a str. Maybe you can consider this for an improvement on the error message.