cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.49k stars 2.99k forks source link

Server crashes if image chunk size is too big > 512 Mb #7959

Open ralwing opened 5 months ago

ralwing commented 5 months ago

Actions before raising this issue

Steps to Reproduce

  1. Create a new task with this collection of pcd files:

  2. Try to open the job

  3. The server returns 500 while trying to load and in there is a crash in the logs

2024-05-28 13:21:42,169 DEBG 'uvicorn-0' stderr output:
[2024-05-28 13:21:42,167] ERROR django.request: Internal Server Error: /api/jobs/588/data
Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/redis/connection.py", line 822, in send_packed_command
    self._sock.sendall(item)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/asgiref/sync.py", line 534, in thread_handler
    raise exc_info[1]
  File "/opt/venv/lib/python3.10/site-packages/django/core/handlers/exception.py", line 42, in inner
    response = await get_response(request)
  File "/opt/venv/lib/python3.10/site-packages/django/core/handlers/base.py", line 253, in _get_response_async
    response = await wrapped_callback(
  File "/opt/venv/lib/python3.10/site-packages/asgiref/sync.py", line 479, in __call__
    ret: _R = await loop.run_in_executor(
  File "/opt/venv/lib/python3.10/site-packages/asgiref/current_thread_executor.py", line 40, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/venv/lib/python3.10/site-packages/asgiref/sync.py", line 538, in thread_handler
    return func(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/django/views/decorators/csrf.py", line 56, in wrapper_view
    return view_func(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/rest_framework/viewsets.py", line 125, in view
    return self.dispatch(request, *args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/rest_framework/views.py", line 509, in dispatch
    response = self.handle_exception(exc)
  File "/opt/venv/lib/python3.10/site-packages/rest_framework/views.py", line 469, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/opt/venv/lib/python3.10/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
    raise exc
  File "/opt/venv/lib/python3.10/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/home/django/cvat/apps/engine/views.py", line 1903, in data
    return data_getter(request, db_job.segment.start_frame,
  File "/home/django/cvat/apps/engine/views.py", line 727, in __call__
    return super().__call__(request, start, stop, db_data)
  File "/home/django/cvat/apps/engine/views.py", line 657, in __call__
    buff, mime_type = frame_provider.get_chunk(self.number, self.quality)
  File "/home/django/cvat/apps/engine/frame_provider.py", line 207, in get_chunk
    return self._loaders[quality].get_chunk_path(chunk_number, quality, self._db_data)
  File "/home/django/cvat/apps/engine/cache.py", line 80, in get_task_chunk_data_with_mime
    item = self._get_or_set_cache_item(
  File "/home/django/cvat/apps/engine/cache.py", line 68, in _get_or_set_cache_item
    item = create_item()
  File "/home/django/cvat/apps/engine/cache.py", line 55, in create_item
    self._cache.set(key, item)
  File "/opt/venv/lib/python3.10/site-packages/django/core/cache/backends/redis.py", line 191, in set
    self._cache.set(key, value, self.get_backend_timeout(timeout))
  File "/opt/venv/lib/python3.10/site-packages/django/core/cache/backends/redis.py", line 108, in set
    client.set(key, value, ex=timeout)
  File "/opt/venv/lib/python3.10/site-packages/redis/commands/core.py", line 2302, in set
    return self.execute_command("SET", *pieces, **options)
  File "/opt/venv/lib/python3.10/site-packages/redis/client.py", line 1258, in execute_command
    return conn.retry.call_with_retry(
  File "/opt/venv/lib/python3.10/site-packages/redis/retry.py", line 49, in call_with_retry
    fail(error)
  File "/opt/venv/lib/python3.10/site-packages/redis/client.py", line 1262, in <lambda>
    lambda error: self._disconnect_raise(conn, error),
  File "/opt/venv/lib/python3.10/site-packages/redis/client.py", line 1248, in _disconnect_raise
    raise error
  File "/opt/venv/lib/python3.10/site-packages/redis/retry.py", line 46, in call_with_retry
    return do()
  File "/opt/venv/lib/python3.10/site-packages/redis/client.py", line 1259, in <lambda>
    lambda: self._send_command_parse_response(
  File "/opt/venv/lib/python3.10/site-packages/redis/client.py", line 1234, in _send_command_parse_response
    conn.send_command(*args)
  File "/opt/venv/lib/python3.10/site-packages/redis/connection.py", line 840, in send_command
    self.send_packed_command(
  File "/opt/venv/lib/python3.10/site-packages/redis/connection.py", line 833, in send_packed_command
    raise ConnectionError(f"Error {errno} while writing to socket. {errmsg}.")
redis.exceptions.ConnectionError: Error 104 while writing to socket. Connection reset by peer.

2024-05-28 13:21:42,170 DEBG 'uvicorn-0' stdout output:
INFO:     172.17.0.7:0 - "GET /api/jobs/588/data?org=&quality=compressed&type=chunk&number=0 HTTP/1.0" 500 Internal Server Error

Expected Behavior

The job is opening without a crashing server.

Possible Solution

When I create a task by splitting this set of files into several smaller ones, the task loads correctly

Context

No response

Environment

Server version: 2.10.2

Core version: 14.1.0

Canvas version: 2.19.1

UI version: 1.61.3

docker ps
CONTAINER ID   IMAGE                                       COMMAND                  CREATED        STATUS                PORTS                                                                                          NAMES
313b516a88ec   gcr.io/iguazio/alpine:3.17                  "/bin/sh -c '/bin/sl…"   4 hours ago    Up 4 hours                                                                                                           nuclio-local-storage-reader
6ff6a699543f   cvat/server:v2.10.2                         "./backend_entrypoin…"   5 weeks ago    Up 2 days             8080/tcp                                                                                       cvat_server
bb31577677e0   cvat.onnx.wongkinyiu.yolov7:latest          "processor"              3 months ago   Up 2 days (healthy)   0.0.0.0:32768->8080/tcp, :::32768->8080/tcp                                                    nuclio-nuclio-onnx-wongkinyiu-yolov7
58d8c150e120   cvat/server:v2.10.2                         "./backend_entrypoin…"   3 months ago   Up 2 days             8080/tcp                                                                                       cvat_worker_annotation
44faad47461b   quay.io/nuclio/dashboard:1.11.24-amd64      "/docker-entrypoint.…"   3 months ago   Up 2 days (healthy)   80/tcp, 0.0.0.0:8070->8070/tcp, :::8070->8070/tcp                                              nuclio
c92c9dc59126   cvat/ui:v2.10.2                             "/docker-entrypoint.…"   3 months ago   Up 2 days             80/tcp                                                                                         cvat_ui
08de5d80177a   cvat/server:v2.10.2                         "./backend_entrypoin…"   3 months ago   Up 2 days             8080/tcp                                                                                       cvat_worker_quality_reports
2539f26b757b   cvat/server:v2.10.2                         "./backend_entrypoin…"   3 months ago   Up 2 days             8080/tcp                                                                                       cvat_worker_webhooks
24d164eceea9   cvat/server:v2.10.2                         "./backend_entrypoin…"   3 months ago   Up 2 days             8080/tcp                                                                                       cvat_worker_import
57a1d9beae70   cvat/server:v2.10.2                         "./backend_entrypoin…"   3 months ago   Up 2 days             8080/tcp                                                                                       cvat_utils
b719a17c7ab3   cvat/server:v2.10.2                         "./backend_entrypoin…"   3 months ago   Up 2 days             8080/tcp                                                                                       cvat_worker_analytics_reports
db6e950d799a   cvat/server:v2.10.2                         "./backend_entrypoin…"   3 months ago   Up 2 days             8080/tcp                                                                                       cvat_worker_export
8faca5315bed   timberio/vector:0.26.0-alpine               "/usr/local/bin/vect…"   3 months ago   Up 2 days                                                                                                            cvat_vector
9eccec23bd9d   traefik:v2.10                               "/entrypoint.sh trae…"   3 months ago   Up 2 days             0.0.0.0:8080->8080/tcp, :::8080->8080/tcp, 80/tcp, 0.0.0.0:8090->8090/tcp, :::8090->8090/tcp   traefik
88e8c039a55b   apache/kvrocks:2.7.0                        "kvrocks -c /var/lib…"   3 months ago   Up 2 days (healthy)   6666/tcp                                                                                       cvat_redis_ondisk
b7b44c5d9557   redis:7.2.3-alpine                          "docker-entrypoint.s…"   3 months ago   Up 2 days             6379/tcp                                                                                       cvat_redis_inmem
be87ecfe416d   clickhouse/clickhouse-server:23.11-alpine   "/entrypoint.sh"         3 months ago   Up 2 days             8123/tcp, 9000/tcp, 9009/tcp                                                                   cvat_clickhouse
3e39d6d1da34   postgres:15-alpine                          "docker-entrypoint.s…"   3 months ago   Up 2 days             5432/tcp                                                                                       cvat_db
cdca097c27e8   openpolicyagent/opa:0.45.0-rootless         "/opa run --server -…"   3 months ago   Up 2 days                                                                                                            cvat_opa
# docker --version
Docker version 23.0.1, build a5ee5b1
/ # docker images
REPOSITORY                                                                  TAG               IMAGE ID       CREATED         SIZE
cvat.pth.dschoerk.transt                                                    latest            5ff0598b1ad0   3 months ago    1.42GB
cvat.openvino.omz.public.mask_rcnn_inception_resnet_v2_atrous_coco          latest            35dbf34336f1   3 months ago    1.94GB
cvat.openvino.omz.public.mask_rcnn_inception_resnet_v2_atrous_coco.base     latest            a361c0d353cf   3 months ago    1.88GB
cvat.openvino.omz.public.faster_rcnn_inception_resnet_v2_atrous_coco        latest            0a6bf6da7cd4   3 months ago    1.73GB
cvat.openvino.omz.public.faster_rcnn_inception_resnet_v2_atrous_coco.base   latest            4c60dc98baff   3 months ago    1.67GB
cvat.openvino.omz.intel.text-detection-0004                                 latest            cd102ee7c8c6   3 months ago    1.5GB
cvat.openvino.omz.intel.text-detection-0004.base                            latest            527ab1dfda6e   3 months ago    1.45GB
cvat.openvino.omz.intel.semantic-segmentation-adas-0001                     latest            4a67fc92e65f   3 months ago    1.51GB
cvat.openvino.omz.intel.semantic-segmentation-adas-0001.base                latest            4b83055c9665   3 months ago    1.46GB
cvat.openvino.omz.intel.person-reidentification-retail-0277                 latest            045ada2caecb   3 months ago    1.63GB
cvat.openvino.omz.intel.person-reidentification-retail-0277.base            latest            57e5f5b3dbfc   3 months ago    1.57GB
cvat.openvino.omz.intel.face-detection-0205                                 latest            b553f4a167a8   3 months ago    1.51GB
cvat.openvino.omz.intel.face-detection-0205.base                            latest            beedfac5e088   3 months ago    1.46GB
cvat.openvino.dextr                                                         latest            93d12e723174   3 months ago    1.68GB
cvat.openvino.dextr.base                                                    latest            d5eddecadb29   3 months ago    1.63GB
cvat.onnx.wongkinyiu.yolov7                                                 latest            e443f3c05b37   3 months ago    770MB
cvat.openvino.base                                                          latest            f8c819411853   3 months ago    1.43GB
traefik                                                                     v2.10             ee69e8120b64   4 months ago    153MB
gcr.io/iguazio/alpine                                                       3.17              eaba187917cc   4 months ago    7.06MB
cvat/ui                                                                     v2.10.2           a83357de1feb   4 months ago    143MB
cvat/server                                                                 v2.10.2           ee6648bb036d   4 months ago    3.02GB
clickhouse/clickhouse-server                                                23.11-alpine      ddd2efb58fe7   4 months ago    910MB
postgres                                                                    15-alpine         478703aef7f8   5 months ago    240MB
apache/kvrocks                                                              2.7.0             373063f3f9d4   5 months ago    37.3MB
redis                                                                       7.2.3-alpine      d2d4688fcebe   5 months ago    41MB
grafana/grafana-oss                                                         10.1.2            31656ec60d2e   8 months ago    391MB
quay.io/nuclio/handler-builder-python-onbuild                               1.11.24-amd64     94caa75b7738   10 months ago   55.9MB
quay.io/nuclio/dashboard                                                    1.11.24-amd64     86a4ab0cb6f4   10 months ago   250MB
cvat/server                                                                 latest            19ef97c9cc19   17 months ago   4.71GB
timberio/vector                                                             0.26.0-alpine     d8ecc9831523   18 months ago   122MB
openpolicyagent/opa                                                         0.45.0-rootless   8723f2dc306a   19 months ago   84.3MB
cvat/ui                                                                     v2.2.0            822d202cfca2   20 months ago   51.2MB
cvat/server                                                                 v2.2.0            0dba6fa26ad3   20 months ago   4.63GB
postgres                                                                    10                1cad456b3a24   23 months ago   202MB
openvino/cvat_server                                                        latest            041f75bb1d7e   2 years ago     5.95GB
cvat_kibana                                                                 latest            5f2f95ad9ef4   2 years ago     493MB
cvat_logstash                                                               latest            a8ea37ce806a   2 years ago     674MB
cvat_elasticsearch                                                          latest            cac2fd48f1aa   2 years ago     678MB
postgres                                                                    10-alpine         2c86947136ab   2 years ago     79.9MB
openvino/cvat_ui                                                            v1.7.0            2b45ff0ccf48   2 years ago     49MB
openvino/cvat_server                                                        v1.7.0            7720e30a355d   2 years ago     4.71GB
openpolicyagent/opa                                                         0.34.2-rootless   f85ee8a15a91   2 years ago     71.9MB
traefik                                                                     v2.4              de1a7c9d5d63   2 years ago     92MB
redis                                                                       4.0-alpine        e3dd0e49bca5   4 years ago     20.4MB
quay.io/nuclio/uhttpc                                                       0.0.1-amd64       5c59b3d31aa8   6 years ago     3.96MB
bsekachev commented 5 months ago

https://stackoverflow.com/questions/64783283/connectionerror-error-104-while-writing-to-socket-connection-reset-by-peer

Try to set chunk size less than default (for example 16 images). One more suggestion is to disable "Use cache" option. It will make task creating longer.

ralwing commented 5 months ago

I have other sets of pointclouds which are way bigger than this (10GB), and they are uploaded and opened fine.

ralwing commented 5 months ago

Redis logs:

I20240529 07:42:07.455459   104 redis_connection.cc:88] [connection] Failed to tokenize the request. Error: Protocol error: invalid bulk length
I20240529 07:59:59.830586   103 redis_connection.cc:88] [connection] Failed to tokenize the request. Error: Protocol error: invalid bulk length
I20240529 08:12:58.690346   102 redis_connection.cc:88] [connection] Failed to tokenize the request. Error: Protocol error: invalid bulk length
I20240529 08:13:03.894992   105 redis_connection.cc:88] [connection] Failed to tokenize the request. Error: Protocol error: invalid bulk length
I20240529 08:28:58.435904   104 redis_connection.cc:88] [connection] Failed to tokenize the request. Error: Protocol error: invalid bulk length
I20240529 09:23:53.568164   104 redis_connection.cc:88] [connection] Failed to tokenize the request. Error: Protocol error: invalid bulk length
I20240529 09:24:00.185640   102 redis_connection.cc:88] [connection] Failed to tokenize the request. Error: Protocol error: invalid bulk length
I20240529 09:43:35.570396   106 redis_connection.cc:88] [connection] Failed to tokenize the request. Error: Protocol error: invalid bulk length
ralwing commented 5 months ago

Shrinking the buffer didn't help. Although disabling no-cache option helps, it triples the job size.