HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
18.24k stars 2.29k forks source link

Unable to label more than four regions in Audio #6061

Open DGAzr opened 2 months ago

DGAzr commented 2 months ago

I am running into a strange issue with an audio transcription task I'm trying to complete. I am able to label four regions of my file without any issue, but as soon as I attempt to label a fifth I get an error and the following stack trace:

Traceback (most recent call last):
  File "/home/op/label-studio/lib/python3.11/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/op/label-studio/lib/python3.11/site-packages/rest_framework/generics.py", line 242, in post
    return self.create(request, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/op/label-studio/lib/python3.11/site-packages/rest_framework/mixins.py", line 17, in create
    serializer = self.get_serializer(data=request.data)
                                          ^^^^^^^^^^^^
  File "/home/op/label-studio/lib/python3.11/site-packages/rest_framework/request.py", line 216, in data
    self._load_data_and_files()
  File "/home/op/label-studio/lib/python3.11/site-packages/rest_framework/request.py", line 279, in _load_data_and_files
    self._data, self._files = self._parse()
                              ^^^^^^^^^^^^^
  File "/home/op/label-studio/lib/python3.11/site-packages/rest_framework/request.py", line 329, in _parse
    stream = self.stream
             ^^^^^^^^^^^
  File "/home/op/label-studio/lib/python3.11/site-packages/rest_framework/request.py", line 203, in stream
    self._load_stream()
  File "/home/op/label-studio/lib/python3.11/site-packages/rest_framework/request.py", line 309, in _load_stream
    self._stream = io.BytesIO(self.body)
                              ^^^^^^^^^
  File "/home/op/label-studio/lib/python3.11/site-packages/rest_framework/request.py", line 416, in __getattr__
    return getattr(self._request, attr)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/op/label-studio/lib/python3.11/site-packages/django/http/request.py", line 330, in body
    raise RawPostDataException("You cannot access body after reading from request's data stream")
django.http.request.RawPostDataException: You cannot access body after reading from request's data stream
Screenshot 2024-07-04 at 7 09 28 AM

I noted a user in the Slack support channel experiencing a nearly identical problem with a video task (4 labels are fine, the fifth throws this error) so perhaps it's reproducible in other task types too: Slack Support Thread

labeling configuration:

<View>
    <Audio name="audio" value="$audio"/>
    <Labels name="label" toName="audio">
      <Label value="Speaker 1" background="#f500ff"/>
      <Label value="Speaker 2" background="#ff0300"/>
    </Labels>
   <TextArea  name="Transcript" toName="audio" perRegion="true" showSubmitButton="true" maxSubmissions="1" editable="true" required="true"/>                            
</View>

To Reproduce Steps to reproduce the behavior:

  1. Create new project
  2. Use MP3 file and "perRegion" TextArea (my view code is above)
  3. Label four regions in the file and enter transcripts
  4. Label a fifth region and attempt to enter a transcript
    • The error will occur before even submitting the new annotation, seemingly as the app attempts to process text from the fifth transcript

Expected behavior More than four annotations on an audio file would work

Environment (please complete the following information):

jombooth commented 1 month ago

/jira create

Workflow run Jira issue TRIAG-762 is created

jombooth commented 1 month ago

Thanks for reporting, @DGAzr ! A strange bug indeed, we'll investigate.