ArneBinder / pytorch-ie-hydra-template-1

PyTorch-IE Hydra Template
8 stars 1 forks source link

add brat serializer #154

Closed Bhuvanesh-Verma closed 8 months ago

Bhuvanesh-Verma commented 8 months ago

This adds Brat Serializer which writes model predictions to annotation files (.ann) in Brat format. It requires a layers parameter to specify the annotation layers to serialize. For now, it supports layers containing LabeledSpan, LabeledMultiSpan, and BinaryRelation annotations. If a gold_label_prefix is provided, the gold annotations are serialized with the given prefix. Otherwise, only the predicted annotations are serialized. A document_processor can be provided to process documents before serialization.

Usage

from src.serializer import BratSerializer
from pie_modules.annotations import BinaryRelation, LabeledSpan
from pie_modules.documents import TextDocumentWithLabeledSpansAndBinaryRelations

# create an example document
document = TextDocumentWithLabeledSpansAndBinaryRelations(
        text="Harry lives in Berlin. He works at DFKI.", id="tmp_1"
)
# create annotations
harry = LabeledSpan(start=0, end=5, label="PERSON")  # Harry
berlin = LabeledSpan(start=15, end=21, label="LOCATION")  # Berlin
# add annotations to the document
document.labeled_spans.predictions.extend([harry, berlin])
document.binary_relations.predictions.append(BinaryRelation(head=harry,tail=berlin,label="lives_in"))

serializer = BratSerializer(path='/tmp', layers=["labeled_spans","binary_relations"])
metadata = serializer(documents=[document])

"""
Saved at os.path.join(metadata['path'], f"{document.id}.ann") with following content

T0      LOCATION 15 21  Berlin
T1      PERSON 0 5      Harry
R0      lives_in Arg1:T1 Arg2:T0
"""

Note: This PR updates pie-modules to v0.10.6 with fixed LabeledMultiSpan (see here)