HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
19.44k stars 2.42k forks source link

Annotations Lost After Network failure for around 2 mins. #6675

Open abhijith96 opened 3 days ago

abhijith96 commented 3 days ago

Hi I have around 10000 images in an azure blob storage container. There are intermittent network connection issues and some times the the call to azure blob store fails. But during that time some of the annotations are lost.

To Reproduce Steps to reproduce the behaviour:

Setup data import form azure blob store. Use a container prefix path. Annotate a few files. Simulate a network error (Turn off the internet connection so that the get request from azure blob fails) Upto 50% of the annotations will be missing after refreshing. My assumption was that all the annotations will be persisted in the sql lite database in my local storage. Losing half of the data that I annotated is causing great pain. During the first network connection error the task number I was annotating was 2003. Then after refresh the annotated task numbers where from 1 to 500 only. Rest of the tasks did not have any annotations.

OS: Macos Label Studio Version 1.14.0.post0

heidi-humansignal commented 19 hours ago

Hello,

This is when you have target storage setup? or it just source storage that causes these issue?

I would also suggest:

Switch to a PostgreSQL Database: Label Studio uses SQLite by default, which might not be ideal for large projects with many tasks. SQLite can have limitations with concurrent access and large datasets, potentially leading to data integrity issues during network interruptions. Switching to a PostgreSQL database can provide better performance and reliability. You can find instructions on how to set up Label Studio with PostgreSQL in our documentation:

https://docs.humansignal.com/guide/storedata.html#PostgreSQL-database

Thank you, Abu

Comment by Abubakar Saad Workflow Run