HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
19.47k stars 2.42k forks source link

Labeling Interface Fails to Load URL of Audio Tag for Tasks With Data Urls #4736

Open PaulP49 opened 1 year ago

PaulP49 commented 1 year ago

Describe the bug Labeling Interface fails to load URL of Audio tag for tasks with data URLs.

To Reproduce

  1. Import tasks from an S3 storage cloud storage with the option "Treat every bucket object as a source file" ticked.

  2. Sync the storage to import the tasks.

  3. The imported tasks have the following format:

    {
    "id": 1,
    "data": {
    "audio": "data:audio/x-wav;base64,...."
    },
    "annotations": [],
    "predictions": []
    }    
  4. Setup the following Labeling Interface:

    <View>
    <Audio name="audio" value="$audio"/>
    <Choices name="choice" toName="audio">
    <Choice value="YES"/>
    <Choice value="NO"/>
    <Choice value="SILENCE"/>
    <Choice value="UNKNOWN"/>
    </Choices>
    </View>
  5. Click on any task in the project view to enter the labeling page.

  6. The Labeling Interface does not show correctly in the UI, because an ERR_INVALID_URL error is thrown.

  7. This happens because the query parameter lsref=1 is added to the URL. But Data URLs do not support query parameters.

Expected behavior Labeling Interface is shown correctly in the UI for tasks with data URLs.

Potential fix Do not automatically add the query parameter lsref=1 to url in audio tag.

Screenshots Screenshot 2023-09-04 111252

Environment (please complete the following information):

iqbalfarz commented 5 months ago

Hi @makseq , I am getting the same error above.

I explained the problem here: https://github.com/HumanSignal/label-studio/issues/6029

makseq commented 5 months ago

We highly don't recommend using "audio": "data:audio/x-wav;base64,....", you have to provide URL (https://) there or URI (s3://..., gs://..., etc) to a storage cloud. "data:audio/x-wav" will generate a huge payloads in api/tasks calls and label studio will go down.

parthagar commented 5 months ago

@makseq I agree with that but it should still work, right? I am also facing the same issue, and we are using S3 but when the data is downloaded from S3 and LS converts it to base64, eventually the case becomes the same. If you want I can work on the resolution, I think I know where the issue is coming from, I'll try to resolve it locally first.