cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
11.82k stars 2.89k forks source link

Matching CVAT task/job with dumped data #3337

Open nstolyarov opened 3 years ago

nstolyarov commented 3 years ago

My actions before raising this issue

Is there any option to match dumped data (ImageNet or Semantic Segmentation dump e.g.) with the same data in CVAT's task -> job -> id?

Where can I get written information about images names in concrete task/job? Maybe DB or any table?

Context

After some automatical operations with annotations (getting metrics for 1.000+ images e.g.) I want to know where should I fix the wrong annotation in CVAT.

Your Environment


- Docker version `docker version` (e.g. Docker 17.0.05): 20.10.05
- Are you using Docker Swarm or Kubernetes? `Nope`
- Operating System and version (e.g. Linux, Windows, MacOS): GNU/Linux Ubuntu 4.15.0-140-generic
zhiltsov-max commented 3 years ago

In most formats, the dumped files have the same names as they had in the source CVAT task. Which formats did you export task to? You can find image names in the annotation window of the CVAT task.

CVAT only allows to navigate to a specific frame index. If you want to get image index in CVAT, you can do one of: a) get it from the dumped data, if the output format includes such information b) sort image paths / names lexicographically and get the index c) export in CVAT for images / CVAT for video / Datumaro and find "frame index" there

Please, describe, what you want to do more precise, so I could help you better.

@bsekachev, adding navigation by image names could be useful.

bsekachev commented 3 years ago

You can see the current image name in CVAT near frame navigation elements.

Screenshot from 2021-06-17 22-42-45

bsekachev commented 3 years ago

Speaking about the database, you can see a mapping there if you have an access to it. But be very careful when working directly with the database.

docker exec -it cvat_db /bin/bash
/usr/local/bin/createuser -s postgres # if you get error that postgres role does not exist.
psql cvat --user postgres

For the task with ID 9:

SELECT engine_image.frame, engine_image.path from engine_task INNER JOIN engine_data on engine_task.data_id=engine_data.id INNER JOIN engine_image on engine_image.data_id = engine_data.id where engine_task.id=9;

Screenshot from 2021-06-17 23-03-55

bsekachev commented 3 years ago

From the UI point of view, I would suggest adding a feature to search a frame number by its name. Would it be a convenient solution for users in your opinion?

nstolyarov commented 3 years ago

Hi @zhiltsov-max and @bsekachev. Thank you for your answers.

I will try to give a clear example.

Suppose I have a full path to the file like "FULL/PATH/image.jpg". And I even know the task name / id where is it (but maybe not). How can I find the job id and frame id for this image in CVAT?

It would be useful if I had info like the following:

TASK_ID JOB_ID FRAME_ID IMG_PATH
111 26 874 full/path/to/image.jpg
103 13 234 full/path/to/another_image.jpg

Is there a possibility to get it from CVAT?

I need this in case when I do some operations with annotations (using Semantic mask 1.1 e.g.) and then I need to fix the concrete image's annotation.

UPDATE

I've tried the following command in cvat_db

SELECT * from engine_task INNER JOIN engine_data on engine_task.data_id=engine_data.id INNER JOIN engine_image on engine_image.data_id = engine_data.id;

Am I right that

bsekachev commented 3 years ago

@nstolyarov

stop frame is the last frame in the task?

Not exactly. stop_frame is the latest frame in a job. A number of frames in a task: engine_task.size

frame is the frame id for this task?

I would say it is a frame number for this task.

id is image id for the whole CVAT?

I am not sure I understand you. engine_image.id is a primary key in the database, so, it is unique for the CVAT instance.

Generally speaking, a frame can be included into two jobs (if an overlap is enabled). You can see a range of frames for a specific job on the task page: Screenshot from 2021-06-18 11-56-05

nstolyarov commented 3 years ago

@bsekachev

Not exactly. stop_frame is the latest frame in a job. A number of frames in a task: engine_task.size

It is strange because in the task with 35 jobs I have segment_size=20, stop_frame=680 and size=681 for every data in the table.

But nevertheless seems that this is realy what I need.

Thank you very much for your help.

bsekachev commented 3 years ago

It is strange because in the task with 35 jobs I have segment_size=20, stop_frame=680 and size=681 for every data in the table.

Sounds really strange. This is a piece of the table engine_segment: Screenshot from 2021-06-18 12-49-16

You can see here start_frame and stop_frame fields are different for the same task_id field.

mikeyEcology commented 3 years ago

I would find it useful if when I exported annotations I could get a list of the image name (the file path) and the image number (2) in the example image I tried to upload. So if I have someone annotating images and there are some with issues I can have her record the number of the image with the issue and I can exclude it from my dataset. So for this example, I'd have a table with a row that has:\ bristlecone2.PNG, 2 Is this available? It sounds like this is what @nstolyarov is asking, but I'm not sure.

image

MattWittbrodt commented 2 years ago

@mikeyEcology this is in the engine_image table.

select path, frame from engine_image will give you that. Just make sure to realize that # (frame number) is replicated across tasks. For example, Task A will have a frame 2 and Task B will have a frame 2.

avengersassemble commented 2 months ago

Try this:

create view task_job_frame as SELECT distinct ep.id as project_id, s.task_id as task_id, j.id AS job_id, s.frame_id, ei.path FROM engine_job j INNER JOIN ( SELECT id, task_id, generate_series(start_frame, stop_frame) AS frame_id FROM engine_segment ) s ON j.segment_id = s.id inner join engine_image ei on ei.frame = s.frame_id and ei.data_id = s.task_id inner join engine_task et on s.task_id = et.id inner join engine_project ep on et.project_id = ep.id ;

Jain-Archit commented 1 week ago

@avengersassemble @MattWittbrodt Is there any way to get the above information using cvat-cli or api? I have a single task which is divided into multiple non-overlapping jobs. Annotators have not done certain jobs and in those jobs that they have finished, there are some corrupt images (no annotations). I want to distinguish the images which are corrupt and the ones which have not been annotated yet. I do not have access to cvat-db and looking for a solution using api/cli. Can anyone help?

zhiltsov-max commented 1 week ago

Hi, please check if the get_meta() and get_frames_info() methods of Task and Job in high-level SDK are useful. Example 1, example 2, more complex example 3 with lower level API.