HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
19.05k stars 2.37k forks source link

Is it possible to convert texts containing relations annotated in Label Studio into a Spacy binary file format ? #4144

Closed Sofyan-fcomte closed 1 year ago

Sofyan-fcomte commented 1 year ago

Is there a converter somewhere that can be used to turn JSON Label Studio files into Spacy binary file format ? I've been working on Label Studio for a while and I didn't expect to hit this roadblock... I urgently need some sort of converter for a relation extraction project. For context, I'm talking about Spacy's custom relation extraction component : https://youtu.be/8HL-Ap5_Axo While Label Studio's documentation shows that annotations for NER can be converted to Spacy binary format if they are exported as CONLL2003 files, the CONLL2003 format doesn't capture the annotated relations.

Is there a solution or work around to this problem ?

AbubakarSaad commented 1 year ago

Hello Sofyan-fcomte,

Unfortunately, there isn't a direct converter to transform Label Studio JSON files with relation annotations into spaCy binary format. However, you can create a custom script to convert the Label Studio JSON annotations to a format that spaCy can understand for relation extraction.

Here's a high-level outline of the steps you can follow:

  1. Export your annotations from Label Studio in JSON format.
  2. Write a custom script to parse the Label Studio JSON annotations and extract the relevant information for relation extraction.
  3. Convert the extracted information into a format that spaCy's custom relation extraction component can understand, such as the format used in the example in the video you provided.
  4. Train your spaCy relation extraction model using the converted data.

Keep in mind that this process requires some programming skills, and you might need to adjust it based on your specific project requirements.