Open guilhermenoronha opened 2 years ago
Hi @guilhermenoronha It's expected behavior, you can delete option valueType="url" to get behavior you want.
Hi @KonstantinKorotaev, Thanks for your answer. I deleted the option valueType="url" but the old behavior persists. I already imported and annotated the data as time series. Does delete this tag, after all, impact in something? In attachments there is my annotations after the removal of valueType. All seems messed now.
Hi @guilhermenoronha You will need to reimport all tasks with texts. For example, you can do it this way:
Download texts with script and save them to new json:
with open(export_filename, mode='r') as f:
data = json.load(f)
for each in data:
url = each['data'][list(each['data'].keys())[0]]
r = requests.get(url)
each['data'][list(each['data'].keys())[0]] = r.text
with open(new_filename, mode='w') as f:
json.dump(data, f)
If you have complicated config - replace cycle instructions
Hi @KonstantinKorotaev . Thanks again. I took some time to reply because the code you provided needed some adjustments to my project. It worked smoothly, thanks! I'm going to post the fully functional code here because I believe it may be helpful to someone else.
import json
import requests
with open('your_json_file.json', mode='r', encoding='UTF-8') as f:
data = json.load(f)
for each in data:
# You may have to change your url depending on where is hosted.
url = f"http://localhost:8080{each['data'][list(each['data'].keys())[0]]}"
# You can Find your token clicking at upper right panel button in Label-Studio and Account & Settings.
r = requests.get(url, headers={'Authorization':'Token put_your_token_here', })
# My text is in Portuguese, so I had to change the encoding to UTF-8.
r.encoding = 'UTF-8'
each['data'][list(each['data'].keys())[0]] = r.text
with open('conll.json', mode='w', encoding='UTF-8') as f:
json.dump(data, f)
Although your solution works well and fits to my problem, I found it a bit clumsy to execute. Expecting a common user who needs to export several tasks in CONLL format, it doesn't seem very natural step to be executed in a production environment, right? Said that, don't you believe it would be best to add as feature request to export TimeSeries texts as CONLL format?
Sincerely.
Describe the bug I'm not sure if this is really a bug, or it is the expected behavior of the label-studio. I imported several text files in a row using the time series option with -valueType="url"- set. The task I'm performing is named entity recognition (NER) in word level with custom tags. When I need to export the annotations, the CONLL file generated has the URLs of the imported files instead of words with each tag annotated.
PS: I tried also the saveTextResult="yes" option, but it didn't provide the expected result.
To Reproduce Steps to reproduce the behavior:
The behavior I got
Expected behavior
Environment (please complete the following information):