cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.2k stars 2.95k forks source link

CVAT 4K video annotation #1507

Open aschernov opened 4 years ago

aschernov commented 4 years ago

My actions before raising this issue

Noticed that if we need to annotate videos with high resolution (4K and >), it's better to set up a CVAT task with the disabled "use zip chunks" option: image

Expected Behaviour

Current Behaviour

Tasks without using zip chunks annotate faster and more convenient. They support the opportunity to use the short keys D/F to quickly rewind some important moments on a video and make corrections. In this case we almost have no problems with the performance.

When we're going to annotate a 4K video and use zip chunks, we have a recommendation to choose the zip chunk size: image If we follow the recommendation and choose 4, for example, it means that we will have a small delay every 4th frame. It influence the performance badly.

Possible Solution

Find the way how to optimize the "use zip chunks" function, but for now disable this option for videos by default.

Horstage commented 2 years ago

I realized that video compression is quite strong, when not using chunks. Here is a screenshot from a 30s 4k (mkv) video of a football game, zoomed way in on two players.

Without using chunks:

not_using_chunks

Using chunks:

using_chunks (JPEG Quality 100%)

Also I have seen encoding artifacts in videos without chunks.

I agree that using chunks is inconvenient regarding loading time, however I need all the detail a 4k video provides even for simple boxes.

Please tell me, whether it is worth opening a new issue for this.

azhavoro commented 2 years ago

I realized that video compression is quite strong, when not using chunks. Here is a screenshot from a 30s 4k (mkv) video of a football game, zoomed way in on two players.

Also I have seen encoding artifacts in videos without chunks.

I agree that using chunks is inconvenient regarding loading time, however I need all the detail a 4k video provides even for simple boxes.

Please tell me, whether it is worth opening a new issue for this.

Hi, could you share the video?

Horstage commented 2 years ago

You can download the video here. Thanks for looking into it!

ashokbalaraman commented 2 years ago

Any update on this issue. I am running into the same issue. Without zip chunks I split each video into 5K frames (per job). When the FPS changed from 24 to 30, it started buffering every 500 frames. What is the solution for uploading a 20 minute 2K video that can play without buffering.

mrKallah commented 2 years ago

I have been experiencing the same issue as above. Thought I'd suggest to possibly keep the chunk n, n+1 and n-1 loaded at all times? I think this would make playback a bit more smooth. The way I understand it, currently it only loads the next chunk when you hit the end of your current chunk, causing a full stop at the end of each chunk. However by loading more than one chunks at any time, you could potentially have less staggered playback both going forward and backward? For my use-case, an implementation such as this would be vastly more useful than the current way it works. I deal with datasets where you might have to do 1-3 separate annotations, each maybe 1000 frames in length in a video of 150,000 frames. Having to stop all the time, makes this a lot slower, as I cant just fast forward. Currently I am working on resolutions taken down to 1280x720x3 and I can't really go lower, as at that point the objects I am labeling will no longer be visible.

zawlin commented 1 year ago

I run into the issue while trying to use cvat for the first time. The buffering issue is causing significant productivity problems. After investigating a bit I increase the chunk size until I start to get SBOX_MEMORY_EXCEEDED error. I think the playback solution need a local storage mechanism like indexeddb instead of relying purely on RAM which cannot be the right way.