[Question]: Data annotation and pre processing for Relation Extraction

piarosebelledelapaz commented 3 months ago

请提出你的问题

Hello,

I am trying to do relation extraction for a document and I have few questions regarding the annotation format to finetune the model.

1) Is multiple linking possible (1-N relations) and is accepted by the model?

vaccine X links to 1st date of vaccination
vaccine X links to 2nd date of vaccination

2) What does train/dev/test.txt generate inside the file? Because i preprocessed my data but there's a bunch of jargons to the generated .txt file so I would like to understand what really is the format to input to the model. I did my annotations accordingly to the label-studio guide provided by PaddleNLP but the contents from the training/validation data files are not clear. Here is a sample content from the train.txt file I got.

3) Once the model has been fine-tuned, does it also generate detection and recognition results from the document, or just the relation extraction results? Because i have fine-tuned weights from PaddleOCR for the detection and recognition. I was wondering if this would be of use with PaddleNLP.

If you could provide me clarifications with this regards, that would be very helpful! Thanks in advance.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动，被标记为stale。

github-actions[bot] commented 4 weeks ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天，即将关闭。

PaddlePaddle / PaddleNLP

[Question]: Data annotation and pre processing for Relation Extraction #8457

请提出你的问题