HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
18.4k stars 2.31k forks source link

Error in Loading audio file in base64 codec #6029

Closed iqbalfarz closed 2 weeks ago

iqbalfarz commented 3 months ago
          Hi @makseq 

I have an issue.

I am storing my audio file in S3 bucket, and I am syncing audio file from there.

It is working in non-labelling UI, I meant where we see all columns present as well as the audio one. Here, I am able to run the audio file.

But, when I click to label this, I am getting below error:

There was an issue loading URL from $audio value

Things to look out for:

URL is valid
URL scheme matches the service scheme, i.e. https and https
The static server has wide-open CORS, [more on that here](https://labelstud.io/guide/storage.html#Troubleshoot-CORS-and-access-problems)
Technical description: HTTP error status: 0
URL: data:audio/mp4;base64,AAAAGGZ0eXBNNEEgAAACA

The task looks like below:

{
  "id": 3916,
  "data": {
    "audio": "data:audio/mp4;base64,AAAAGGZ0eXBNNEEgAAACAGlzb21pc28yAAAACGZyZWUAAqgJbWRhdCERRQAUUAFG//EKWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaWlpaXf/iFLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t

Please help me out @makseq , It's an urgent task.

thanks a lot and I hope to hear from you soon.

Thanks again!

Originally posted by @iqbalfarz in https://github.com/HumanSignal/label-studio/issues/1492#issuecomment-2186261232

iqbalfarz commented 3 months ago

Hi @makseq Please help me out.

Thanks!

ilanit1997 commented 3 months ago

I am having the same issue with google cloud storage audio file:

`There was an issue loading URL from $audio value

Things to look out for:

URL is valid
URL scheme matches the service scheme, i.e. https and https
The static server has wide-open CORS, [more on that here](https://docs.heartex.com/guide/storage.html#Troubleshoot-CORS-and-access-problems)
Technical description: HTTP error status: 0
URL: data:audio/mpeg;base64,ASBC....
`

where my $audio is like: "audio" : "gs://BUCKET_NAME/BUCKET_FOLDER/FILENAME.mp3"

image image
makseq commented 3 months ago

@iqbalfarz are you sure you turned on "Use pre-signed URLs" in the storage settings? and also you should use valueType="url" for your Audio tag in the labeling config.

makseq commented 3 months ago

I believe audio tag doesn't support base64 data in tasks.

ilanit1997 commented 3 months ago

I upload a regular mp3 file to my bucket, then provide a short uri to as in my $audio variable, and it appears as the raw base64 format for some reason (inside the task). In addition, I don't know if it has anything to do with it - but my browser also crashes when I am trying to load a labeling task (enterprise version), with out of memory message.

What should I do? Is there a different format supported in tasks?

Would appreciate your advice in the matter.

makseq commented 3 months ago

Usually this happens when "Use pre-signed urls" is off. Please re-check this option in the Cloud Storage settings. image

What LS version do you use?

ilanit1997 commented 3 months ago

I deleted and created the storage again with the "use pre-signed URLs" turned on.

now:


Technical description: HTTP error status: 0
URL: [https://app.heartex.com/tasks/xxxx/presign/?fileuri=xxxx](https://app.heartex.com/tasks/x/presign/?fileuri=Z3=)

when clicking on the link it opens the audio file in a new window and plays the audio but inside the UI there is the above error.

I am working with enterprise UI: "release": "2.13.1.dev2".

makseq commented 3 months ago

Most likely, you haven't configured the CORS settings on GCS side: https://docs.humansignal.com/guide/persistent_storage#Configure-CORS-for-the-GCS-bucket

parthagar commented 3 months ago

I believe audio tag doesn't support base64 data in tasks.

@makseq what can we do to support it? Can we help in any way?

ilanit1997 commented 3 months ago

The problem was solved by setting cors config to: cors_config = [ { "origin": ["*"], "responseHeader": ["Content-Type", "Access-Control-Allow-Origin"], "method": ["GET", "POST", "PUT", "DELETE", "HEAD"],

    "maxAgeSeconds": 3600,
}

]

Beforehand the origin contained specific url (app-heartx..), and the method only contained GET.

makseq commented 3 months ago

@parthagar why do you need this? It's a bad way to operate with label studio tasks, your data manager pages will start perform badly, browser will hang a lot and probably it will lead to page crashes and OOM, because your browser will handle the whole audio data for the page at once.

Most likely, you have to rethink your pipeline and switch to storing your data in s3/gcs/etc.

makseq commented 2 weeks ago

Close as solved.