HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
18.24k stars 2.29k forks source link

Possible annotation buffer overflow #1706

Open pjvalla opened 2 years ago

pjvalla commented 2 years ago

We are doing some very long sequence to sequence modeling. I believe I have found an buffer overflow in Label-studio. Annotating one very long time-series caused the annotation type to change and it cleared the annotation (in this case a very long string). If I shorten the annotation (same task) then the UI works as expected.

Running the 1.3.0 container under Ubuntu 18.04

makseq commented 2 years ago

@pjvalla Could you clarify please: are you talking about Time Series labeling or you mentioned text sequences as time series?

pjvalla commented 2 years ago

So the project is an audio transcription effort. There is a time-series plot (.csv file) that represents the audio (it is a downsampled and frequency converted version of the audio). The audio is also included for playback purposes. The label is a string. (the transcription of the audio). Both the time-series and audio files can be quite large.

makseq commented 2 years ago

@pjvalla Why don't you use AudioPlus tag? It has a waveform..

pjvalla commented 2 years ago

The time series is a transform of the audio (that is not pleasant to listen to, but is informative to the annotator)