Open SheldonWBM opened 11 months ago
@SheldonWBM we're looking into this internally. I believe that open source doesn't have a queue-based approach to syncing, so it indeed does do a complete reload of the source. I've created a product ticket to request this improvement.
Hello @hogepodge , I wanted to follow-up here to see if there has been any progress on this. We sync a growing base of tasks (on Enterprise and Community versions of Label Studio) from S3 and the sync has increased to take over an hour now (and is continually increasing daily). Is there an ETA for adding some sort of queue-based sync? Do you have any work-arounds for this in the meantime? Thank you!
Hi @noahlibby17, I decided to try the sync command today and immediately regretted it. I do have a workaround (which I should have used).
[
{
"data": {
"image": "s3://my_path/filename.jpg"
},
"annotations": []
},
{
"data": {
"image": "s3://my_path/filename2.jpg"
},
"annotations": []
}
]
The only issue I have encountered, which might have me revert the database to earlier today, is that it might create duplicate tasks when you "sync" with s3 cloud storage. If it creates duplicate tasks, you can revert the database or delete the duplicate task entries in the database.
Note: The preferred database to use with label-studio, for large projects, is PostgreSQL.
Thank you so much @SheldonWBM!
Is your feature request related to a problem? Please describe. Syncing seems to be a slow process.
Describe the solution you'd like When new items are added to the "Source Cloud Storage", the user needs to press "Sync Storage". This re-syncs all items starting at 0, not only the newly added items. There should be an option to import only new tasks that currently do not exist from cloud storage. Perhaps a caching method or, to sync only new items based on timestamp or other metadata. The user could have the option to use the traditional sync vs. the fast sync.
Additional context Currently have ~22,000 items in source syncing.