bionlplab / bioc

Data structures and code to read/write BioC XML and Json.
MIT License
29 stars 11 forks source link

Brat export of documents without entities #21

Closed chlor closed 1 year ago

chlor commented 1 year ago

Hi,

I used the Brat export function of a protected corpus of a given BioC-XML-file, but I have an error AttributeError: 'BioCDocument' object has no attribute 'entities' Is it possible, to create BioC files without the definition of 'entities'?

I created the entities by my self: ` for passage in doc.passages: i = i + 1 annotations = []

    for ann in passage.annotations:
        off = ann.locations
        key = len(annotations)
        start = off[0].offset
        end = off[0].offset + off[0].length
        ann = 'T' + str(key) + '\t' + ann.infons['type'] + ' ' + str(start) + ' ' + str(end) + '\t' + passage.text[off[0].offset:(off[0].offset + off[0].length)]
        annotations.append(ann)

`

Do you have an idea?

Best regards, Christina

serenalotreck commented 1 year ago

@chlor I am also running into this issue.

I opened #22 about bioc2brat not having a reverse conversion. Afterwards in the encoding a brat object section of the documentation I noticed that you can serialize a "doc" object to a brat string. I was wondering what kind of doc object it was referring to, and if I could use that to do the reverse conversion even if it wasn't included in the brat2bioc module.

I pulled a BioCDocument instance out of my collection, and ran the following code:

from bioc import brat
brat.dumps_ann(doc)

However, this gave me the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/home/lotrecks/.local/lib/python3.7/site-packages/bioc/brat/encoder.py", line 71, in dumps_ann
    dump_ann(doc, output)
  File "/mnt/home/lotrecks/.local/lib/python3.7/site-packages/bioc/brat/encoder.py", line 55, in dump_ann
    for ent in doc.entities:
AttributeError: 'BioCDocument' object has no attribute 'entities'

Searching the repo code it looks like there are two types of doc objects: BioCDocument and BratDocument.

From what I can tell, these two document classes are not interchangeable, and I would assume the documentation for encoding a brat document refers to a BratDocument, and therefore dumping a BioCDocument to a brat file isn't possible.

@chlor Were you able to get this working by manually adding entities?

yfpeng commented 1 year ago

Since it is a bioc package, we only provide functions to convert to the bioc format. We don't provide a reverse conversion.