cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.41k stars 2.98k forks source link

Opening Job with 200 images takes 1.5 minutes #5719

Open blafasel42 opened 1 year ago

blafasel42 commented 1 year ago

My actions before raising this issue

Steps to Reproduce (for bugs)

  1. Create ORG Projekt with 6 Tasks, total of 9500 images
  2. Tasks have Jobs with 200 images each
  3. open one of the jobs

Expected Behaviour

UI responsive after a couple of seconds

Current Behaviour

UI grayed out for 1.5 Minutes During this time, server had little CPU load, Little HDD throughput, 2Mbps network traffic A request was made to: https://cvat.100days.cloud/api/jobs/454/data?org=100DAYS&quality=compressed&type=chunk&number=0 it has content-length: 132892673 type: application/zip and took 1.4 min to deliver

Context

Tyring to open a job in a project.

Your Environment

dschoerk commented 1 year ago

As convenient as the web interface might be, I personally do not trust a web interface to handle gigabytes of data in an effective way. CVAT has a decent CLI to do such tasks. https://opencv.github.io/cvat/docs/api_sdk/cli/

blafasel42 commented 1 year ago

thanks for the answer, @dschoerk . Our use case however is to have people check and correct the bounding boxes, therefore a UI would come in quite handy...

nmanovic commented 1 year ago

@blafasel42 , which version of CVAT do you use? We have optimized our requests. Also we are going to change REST API and it should contribute into the speed up as well: https://github.com/opencv/cvat/pull/5662

blafasel42 commented 1 year ago

@nmanovic:

versions:

Server version: 2.2

Core version: 6.0.2

Canvas version: 2.15.3

UI version: 1.41.5

zhiltsov-max commented 1 year ago

@blafasel42, how the images were imported to CVAT? If they are from a cloud storage, the data is not copied and there is caching enabled, they might be downloaded on the first access. Was it only the first access or any?

Also, what is the typical frame size in the task and chunk size (if enabled)?

blafasel42 commented 1 year ago

@zhiltsov-max We uploaded a ZIP file, storage is on the server machine. we have 200 images per job. was that what you meant? Couple of thousand in a Task

zhiltsov-max commented 1 year ago

Okay, what are the task parameters here: image

was that what you meant?

I'd like to know typical image size in the task, e.g. 4K, 10K, 800x600 etc. Larger images may decrease performance, depending on the configuration options and compression used.

blafasel42 commented 1 year ago

Image quality is 100%, Overlap Size is 0, Segment Size 200. The problem occurs when opening one of the jobs. What i see in the network tab is, that the complete job, including all images, is loaded when i open it? Because afterwards when i click "next" or "previous" image, there is no more network traffic.

zhiltsov-max commented 1 year ago

In general, CVAT loads frames in chunks by N (e.g. 32) images, so movements between several adjacent frames should be fast, but wider frame steps can take some time to load a chunk. In case of big images with high quality and big chunks, this time can also be quite big, depending on the network capabilities.

From your initial description:

it has content-length: 132892673

It's ~127 MB per chunk.

The problem occurs when opening one of the jobs.

Does the problem appear in a just one job or it happens in others as well?

Please provide this info if possible, it is crucial to reproduce the problem:

image quality, use chunks, chunk size, use cache?

was that what you meant?

I'd like to know typical image size in the task, e.g. 4K, 10K, 800x600 etc. Larger images may decrease performance, depending on the configuration options and compression used.

blafasel42 commented 1 year ago

I happes with all Jobs. Image size is usually 1980x1080. I did not fill in "Chunk Size". Does that mean it is 200? I understand the problem when all 200 images are loaded in one go. The Network should not be a problem. We have 10GBit on server side and 1GBit on client side.

bsekachev commented 1 year ago

2Mbps network traffic

Maybe you mean Megabytes per second (MB/s)?

In this case 127 MB / 2 = ~63.5 sec is only time to get one chunk from the server. Next it requires time to be decoded on client side.

Try do not use 100% image quality. In the most of cases it is excessive. Try to compare jpeg images with 100/90/80/70 quality and their size. You can also decrease chunk size to 4 or 8. It will allow to open job faster.