cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.43k stars 2.99k forks source link

CVAT fails health check using >90% disk #5449

Open harrystuart opened 1 year ago

harrystuart commented 1 year ago

My actions before raising this issue

Clone latest develop. docker-compose -f docker-compose.yml -f docker-compose.dev.yml build docker-compose up -d

Expected Behaviour

Startup and run application.

Current Behaviour

Computer runs very slowly as CVAT uses significant resources. When trying to login, the following occurs:

get http://10.0.253.33:9001/api/server/health/?format=json&org= {"Cache backend: default": "working", "DatabaseBackend": "working", "DiskUsage": "warning: 86c1cd71866c 92.4% disk usage exceeds 90%", "MemoryUsage": "working", "MigrationsHealthCheck": "working"}

Server logs:

2022-12-12 04:54:32,063 DEBG 'runserver' stderr output:
[Mon Dec 12 04:54:32.063061 2022] [wsgi:error] [pid 178:tid 139765320791808] ERROR:health-check:warning: 35a75041c750 92.4% disk usage exceeds 90%
[Mon Dec 12 04:54:32.063076 2022] [wsgi:error] [pid 178:tid 139765320791808] Traceback (most recent call last):
[Mon Dec 12 04:54:32.063085 2022] [wsgi:error] [pid 178:tid 139765320791808]   File "/opt/venv/lib/python3.8/site-packages/health_check/backends.py", line 30, in run_check

2022-12-12 04:54:32,063 DEBG 'runserver' stderr output:
[Mon Dec 12 04:54:32.063125 2022] [wsgi:error] [pid 178:tid 139765320791808]     self.check_status()

2022-12-12 04:54:32,063 DEBG 'runserver' stderr output:
[Mon Dec 12 04:54:32.063420 2022] [wsgi:error] [pid 178:tid 139765320791808]   File "/opt/venv/lib/python3.8/site-packages/health_check/contrib/psutil/backends.py", line 21, in check_status
[Mon Dec 12 04:54:32.063422 2022] [wsgi:error] [pid 178:tid 139765320791808]     raise ServiceWarning(
[Mon Dec 12 04:54:32.063422 2022] [wsgi:error] [pid 178:tid 139765320791808] health_check.exceptions.ServiceWarning: warning: 35a75041c750 92.4% disk usage exceeds 90%

2022-12-12 04:54:32,082 DEBG 'runserver' stderr output:
[Mon Dec 12 04:54:32.082022 2022] [wsgi:error] [pid 178:tid 139767587452672] [remote 172.27.0.4:39110] [2022-12-12 04:54:32,081] ERROR django.request: Internal Server Error: /api/server/health/
[Mon Dec 12 04:54:32.082094 2022] [wsgi:error] [pid 178:tid 139767587452672] [remote 172.27.0.4:39110] ERROR:django.request:Internal Server Error: /api/server/health/
nmanovic commented 1 year ago

@harrystuart , you can adjust health check limits (https://github.com/revsys/django-health-check/blob/master/health_check/conf.py)

Please try to define in cvat/settings/base.py the following variable:

HEALTH_CHECK = {
    "DISK_USAGE_MAX": 99
}
harrystuart commented 1 year ago

Does it sound right for CVAT to be using so much in the first place? I don’t recall it being so consuming.

On Tue, 13 Dec 2022 at 11:09 pm, Nikita Manovich @.***> wrote:

@harrystuart https://github.com/harrystuart , you can adjust health check limits ( https://github.com/revsys/django-health-check/blob/master/health_check/conf.py )

Please try to define in cvat/settings/base.py the following variable:

HEALTH_CHECK = { "DISK_USAGE_MAX": 99 }

— Reply to this email directly, view it on GitHub https://github.com/opencv/cvat/issues/5449#issuecomment-1348394380, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKHFNCMC3Q755ZP2BEAQJDTWNBRPDANCNFSM6AAAAAAS3O7Z5A . You are receiving this because you were mentioned.Message ID: @.***>

nmanovic commented 1 year ago

Need to investigate why it consumes so many resources. I agree, it isn't right.

Harsturomai commented 1 year ago

Any update on this? I have also run into this problem. The proposed fix does not seem to work. Where in the codebase can I override the health check or change the limits?

Ghaitharar commented 1 year ago

Any update on this?

I used CVAT on windows 10 to annotated a number of .mp4 videos that are collectively less than 1GB in size. When trying to extract the dataset, CVAT consumes more than 100GB and then report "OS Error: [Errno 28] No space left on device\n/".

Could it be related to image compression level of the tasks?

image

zhiltsov-max commented 1 year ago

@harrystuart, @Harsturomai, @Ghaitharar, hi, could you please share and check the output of the following commands in the terminal:

docker exec -it -u django cvat_server python -c 'import os; print(os.statvfs("/"))'
docker exec -it -u django cvat_server python -c 'import psutil; print(psutil.disk_usage("/"))'
docker exec -it -u django cvat_server df -h /

I used CVAT on windows 10 to annotated a number of .mp4 videos that are collectively less than 1GB in size. When trying to extract the dataset, CVAT consumes more than 100GB and then report

@Ghaitharar, it can be related to video frame unpacking. If I understand correctly, you're trying to export a task with images. In this case the video images will be saved on disk and then packed into an archive. If you're exporting a task, please try to turn off the Save images option.

In general, I see few ways to handle the problem:

Ghaitharar commented 1 year ago

@zhiltsov-max

os.statvfs_result(f_bsize=4096, f_frsize=4096, f_blocks=65793553, f_bfree=40155087, f_bavail=36795548, f_files=16777216, f_ffree=16686506, f_favail=16686506, f_flag=4096, f_namemax=255)

sdiskusage(total=269490393088, used=105015156736, free=150714564608, percent=41.1)

Filesystem Size Used Avail Use% Mounted on overlay 251G 98G 141G 42% /

Is there a way to turn off use chunks after a task is created? I'v already created a number of tasks with this option enabled.

zhiltsov-max commented 1 year ago

Is there a way to turn off use chunks after a task is created? I'v already created a number of tasks with this option enabled.

Unfortunately, no. I can suggest you to export the annotations without images, and then you can download images separately using SDK or CLI: cvat-cli frames --quality original --outdir task_<N>_images <task_id> (modify the command to fit your case).

wegmatho commented 1 year ago

HEALTH_CHECK = { "DISK_USAGE_MAX": 99 }

Are these settings somehow injectable without baking a new cvat docker image? I am "suffering" from the same issue, since my K8s nodes have little disk space, thus running into the limit preventing the webserver to start, even though cvat-backend-data volume is placed on another machine via nfs volume and has plenty space.

harrystuart commented 1 year ago

Hi all, I will be away from my computer until the 6th of Jan. Even if a resolution is found, may I please ask that this issue is not closed until I can test? Much appreciated

nmanovic commented 1 year ago
ksaluja15 commented 1 year ago

@nmanovic I added the health check variable, but somehow it's not being used when I relaunch with the docker compose command. any idea ?

pwichmann commented 1 year ago

I have the same problem. Cannot get CVAT to run. What a shame since Label Studio sucks, too.

pwichmann commented 1 year ago

CVAT needed more than 70 GB of free disk space on my machine. I gave it 280 GB of free space now and it is running. Note: I just installed it. These were the disk needs before any labelling.

sourabh0207 commented 1 year ago

I cleared the cache as mentioned above but maybe due to this I am unable to get analytics, Is anybody facing the same issue?

https://github.com/opencv/cvat/issues/6724