PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.9k stars 2.9k forks source link

[Question]: Data annotation and pre processing for Relation Extraction #8457

Closed piarosebelledelapaz closed 4 weeks ago

piarosebelledelapaz commented 3 months ago

请提出你的问题

Hello,

I am trying to do relation extraction for a document and I have few questions regarding the annotation format to finetune the model.

1) Is multiple linking possible (1-N relations) and is accepted by the model?

2) What does train/dev/test.txt generate inside the file? Because i preprocessed my data but there's a bunch of jargons to the generated .txt file so I would like to understand what really is the format to input to the model. I did my annotations accordingly to the label-studio guide provided by PaddleNLP but the contents from the training/validation data files are not clear. Here is a sample content from the train.txt file I got. image

3) Once the model has been fine-tuned, does it also generate detection and recognition results from the document, or just the relation extraction results? Because i have fine-tuned weights from PaddleOCR for the detection and recognition. I was wondering if this would be of use with PaddleNLP.

If you could provide me clarifications with this regards, that would be very helpful! Thanks in advance.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] commented 4 weeks ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。