cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.56k stars 3k forks source link

Ensure object server IDs do not change #3934

Open ehofesmann opened 2 years ago

ehofesmann commented 2 years ago

My actions before raising this issue

Following the discussion here: https://github.com/openvinotoolkit/cvat/issues/893#issuecomment-962644749

All object annotations in CVAT are apparently assigned a unique ID when saved to the server. It was mentioned in the conversation linked above that it is possible that these server IDs get updated. The reasoning is that it is sometimes more efficient to delete an object and recreate it (thus assigning a new ID) rather than simply updating the existing object.

While this ID reassignment does not seem to raise any issues with operations within CVAT, it results in complications for outside workflows that want to connect with CVAT and maintain knowledge of the specific objects that were uploaded.

Expected Behaviour

The expected behavior would be that once a label/object is uploaded to the CVAT server and is assigned a server ID, that server ID will never be changed unless the user manually deletes the label/object.

Current Behaviour

At the moment, it is possible that the server ID of a label can be modified due to operations other than deletion.

Possible Solution

I am not familiar enough with the codebase to understand all of the operations that can result in the server ID of an existing label/object being modified. However, a solution to this problem should result in a guarantee that once a server ID is assigned, it will never be changed.

@nmanovic suggested that the bulk_update operation in Django could allow for efficient updating of objects without needing to delete them.

Steps to Reproduce (for bugs)

I have not been able to reliably reproduce this change in server ID myself, but the possibility of it occurring raises potential issues in the workflows described below.

Context

I am working on an open-source dataset curation and model analysis tool for computer vision called FiftyOne. This tool allows ML researchers and engineers to construct, visualize, explore, and most importantly, store all metadata related to an image or video dataset. Since CVAT is an awesome tool for annotation, I have been working on an integration between FiftyOne and CVAT which allows users to automatically upload data and labels from a FiftyOne dataset to CVAT for annotation/refinement, and then load the updated labels back into the FiftyOne dataset.

The issue of inconsistent server IDs comes into play when specific objects are being reannotated. Each object has a unique ID in FiftyOne that can be tied to the server ID it is assigned in CVAT. This allows us to track exactly which objects are added/modified/deleted and handle each of those three possibilities separately. The problem is that if the server ID is changed even though the object is not deleted, then when loading labels back into FiftyOne, it would seem that the object was deleted even though it was just modified, resulting in undesired behaviors.

This problem is not unique to my use case but would arise in any situation in which someone wants to upload existing objects to CVAT and know exactly which objects were added/modified/deleted.

Your Environment

Next steps

You may join our Gitter channel for community support.

nmanovic commented 2 years ago

@ehofesmann , thanks for the detailed explanation and context. Let's look into the issue one more time. Let me treat the issue as a bug for now.