HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
18k stars 2.25k forks source link

New line leads to more annotation tasks in "Named Entity Recognition" in "Natural Language Processing" template #4409

Open morettif opened 1 year ago

morettif commented 1 year ago

Describe the bug A new line split a text to be annotated in more tasks To Reproduce Drag and Drop a file including \n character

Expected behavior Do not split text into multiple tasks or have an ID that allows a task to be traced back to the original text from which it was derived

Environment (please complete the following information):

AbubakarSaad commented 1 year ago

Hello morettif,

How are you importing the files? On UI there should be option to load whole text file

Screenshot 2023-06-21 at 12 58 34 AM
morettif commented 5 months ago

Hi @AbubakarSaad,

Sorry for the terrible delay in replying. The problem I described occurs when I load the file (a .txt to be precise) as a list of tasks. I chose this upload mode because if I select "Time Series or Whole Text File", the text displayed is /data/upload/248/06c9a425-doc1105.txt, where doc1105 is the name of the file. The problem does not occur uploading a json file created from the dictionary dict = {"data": {"text": content}}, where "content" is the same text in the txt file including the \n.

Thank you in advance for your support.

juv85 commented 1 month ago

Hi @AbubakarSaad,

Sorry for the terrible delay in replying. The problem I described occurs when I load the file (a .txt to be precise) as a list of tasks. I chose this upload mode because if I select "Time Series or Whole Text File", the text displayed is /data/upload/248/06c9a425-doc1105.txt, where doc1105 is the name of the file. The problem does not occur uploading a json file created from the dictionary dict = {"data": {"text": content}}, where "content" is the same text in the txt file including the \n.

Thank you in advance for your support.

I encounter thesame problem, is any solution or workaround available please ?