cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.44k stars 2.99k forks source link

Another Server 500 Error on first run with a fresh install...Docker network? #7836

Closed iWoodsman closed 5 months ago

iWoodsman commented 5 months ago

Actions before raising this issue

Steps to Reproduce

  1. Install cvat: git clone https://github.com/cvat-ai/cvat
  2. export CVAT_HOST=host.name.com
  3. sudo docker exec -it cvat_server bash -ic 'python3 ~/manage.py createsuperuser' (superuser is created)
  4. docker compose up -d (everything comes up)
  5. docker exec -t cvat_server python manage.py health_check (result of health check. opa reports an error. The disk has 3TB free space)
health_check.exceptions.HealthCheckException: unknown error: 500 Server Error: Internal Server Error for url: http://opa:8181/health?bundles
Cache backend: default   ... working
Cache backend: media     ... working
DatabaseBackend          ... working
DiskUsage                ... working
MemoryUsage              ... working
OPAHealthCheck           ... unknown error: 500 Server Error: Internal Server Error for url: http://opa:8181/health?bundles

The site is accessible from the Internet using Chrome at http://host.name.com:8080 but the page that loads shows a spinner before reporting

"Cannot connect to the server
Make sure the CVAT backend and all necessary services (Database, Redis and Open Policy Agent) are running and available. If you upgraded from version 2.2.0 or earlier, manual actions may be needed, see the [Upgrade Guide](https://opencv.github.io/cvat/docs/administration/advanced/upgrade_guide)."

docker ps shows all containers running. Perhaps it is significant that cvat_opa does not expose a port?

CONTAINER ID   IMAGE                                       COMMAND                  CREATED       STATUS                       PORTS                                                    NAMES
c39936b1f67e   cvat/ui:dev                                 "/docker-entrypoint.…"   2 hours ago   Up About an hour             80/tcp                                                   cvat_ui
8837531d1178   cvat/server:dev                             "./backend_entrypoin…"   2 hours ago   Up About an hour             8080/tcp                                                 cvat_worker_import
7b7705ab33b4   cvat/server:dev                             "./backend_entrypoin…"   2 hours ago   Up About an hour             8080/tcp                                                 cvat_worker_export
bee0b6fb6c38   cvat/server:dev                             "./backend_entrypoin…"   2 hours ago   Up About an hour             8080/tcp                                                 cvat_utils
79b6909e137e   cvat/server:dev                             "./backend_entrypoin…"   2 hours ago   Up About an hour             8080/tcp                                                 cvat_worker_webhooks
7d7ab4c09195   cvat/server:dev                             "./backend_entrypoin…"   2 hours ago   Up About an hour             8080/tcp                                                 cvat_worker_analytics_reports
0f561db87ecc   cvat/server:dev                             "./backend_entrypoin…"   2 hours ago   Up About an hour             8080/tcp                                                 cvat_worker_quality_reports
dbd0f682028d   cvat/server:dev                             "./backend_entrypoin…"   2 hours ago   Up About an hour             8080/tcp                                                 cvat_server
cf634993a054   cvat/server:dev                             "./backend_entrypoin…"   2 hours ago   Up About an hour             8080/tcp                                                 cvat_worker_annotation
3aaddb159470   timberio/vector:0.26.0-alpine               "/usr/local/bin/vect…"   2 hours ago   Up About an hour                                                                      cvat_vector
1ee31d876df5   redis:7.2.3-alpine                          "docker-entrypoint.s…"   2 hours ago   Up About an hour             6379/tcp                                                 cvat_redis_inmem
0047e312128a   traefik:v2.10                               "/entrypoint.sh trae…"   2 hours ago   Up About an hour             0.0.0.0:8080->8080/tcp, 80/tcp, 0.0.0.0:8090->8090/tcp   traefik
c92241ccdbe8   postgres:15-alpine                          "docker-entrypoint.s…"   2 hours ago   Up About an hour             5432/tcp                                                 cvat_db
c811fe5f02e1   openpolicyagent/opa:0.63.0                  "/opa run --server -…"   2 hours ago   Up About an hour                                                                      cvat_opa
e434d6fcc5a0   clickhouse/clickhouse-server:23.11-alpine   "/entrypoint.sh"         2 hours ago   Up About an hour             8123/tcp, 9000/tcp, 9009/tcp                             cvat_clickhouse
1f97fa4452e9   apache/kvrocks:2.7.0                        "kvrocks -c /var/lib…"   2 hours ago   Up About an hour (healthy)   6666/tcp                                                 cvat_redis_ondisk

docker logs cvat_opa gives

{"level":"error","msg":"Bundle load failed: request failed: Get \"http://cvat-server:8080/api/auth/rules\": dial tcp: lookup cvat-server on 127.0.0.11:53: server misbehaving","name":"cvat","plugin":"bundle","time":"2024-05-01T15:53:05Z"}
{"level":"error","msg":"Bundle load failed: request failed: Get \"http://cvat-server:8080/api/auth/rules\": dial tcp: lookup cvat-server on 127.0.0.11:53: server misbehaving","name":"cvat","plugin":"bundle","time":"2024-05-01T15:53:05Z"}
{"level":"error","msg":"Bundle load failed: request failed: Get \"http://cvat-server:8080/api/auth/rules\": dial tcp: lookup cvat-server on 127.0.0.11:53: server misbehaving","name":"cvat","plugin":"bundle","time":"2024-05-01T15:53:05Z"}
{"level":"error","msg":"Bundle load failed: request failed: Get \"http://cvat-server:8080/api/auth/rules\": dial tcp: lookup cvat-server on 127.0.0.11:53: server misbehaving","name":"cvat","plugin":"bundle","time":"2024-05-01T15:53:05Z"}
{"level":"error","msg":"Bundle load failed: request failed: Get \"http://cvat-server:8080/api/auth/rules\": dial tcp: lookup cvat-server on 127.0.0.11:53: server misbehaving","name":"cvat","plugin":"bundle","time":"2024-05-01T15:53:05Z"}
{"level":"error","msg":"Bundle load failed: request failed: Get \"http://cvat-server:8080/api/auth/rules\": dial tcp 172.20.0.16:8080: connect: connection refused","name":"cvat","plugin":"bundle","time":"2024-05-01T15:53:07Z"}
{"level":"error","msg":"Bundle load failed: request failed: Get \"http://cvat-server:8080/api/auth/rules\": dial tcp 172.20.0.16:8080: connect: connection refused","name":"cvat","plugin":"bundle","time":"2024-05-01T15:53:08Z"}
{"level":"error","msg":"Bundle load failed: request failed: Get \"http://cvat-server:8080/api/auth/rules\": dial tcp 172.20.0.16:8080: connect: connection refused","name":"cvat","plugin":"bundle","time":"2024-05-01T15:53:10Z"}
{"level":"error","msg":"Bundle load failed: request failed: Get \"http://cvat-server:8080/api/auth/rules\": dial tcp 172.20.0.16:8080: connect: connection refused","name":"cvat","plugin":"bundle","time":"2024-05-01T15:53:13Z"}
{"level":"error","msg":"Bundle load failed: server replied with Internal Server Error","name":"cvat","plugin":"bundle","time":"2024-05-01T15:53:17Z"}
{"level":"error","msg":"Bundle load failed: server replied with Internal Server Error","name":"cvat","plugin":"bundle","time":"2024-05-01T15:53:24Z"}
{"level":"error","msg":"Bundle load failed: server replied with Internal Server Error","name":"cvat","plugin":"bundle","time":"2024-05-01T15:53:36Z"}

docker network inspect cvat_cvat gives

[
    {
        "Name": "cvat_cvat",
        "Id": "8a215c7499eccb4588ee79c822bb3e13d397e3056d96548f4e0baa73192417b1",
        "Created": "2024-05-01T11:53:02.649526432-04:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.20.0.0/16",
                    "Gateway": "172.20.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "0047e312128ad2b45c9611cca55bd341d4ec0edd06e756623c0dc072185517d9": {
                "Name": "traefik",
                "EndpointID": "3a4fa4154cbde3861c2334729b33b33c9bb7866026c6057bd89bbbd81ac0f2c0",
                "MacAddress": "02:42:ac:14:00:08",
                "IPv4Address": "172.20.0.8/16",
                "IPv6Address": ""
            },
            "0f561db87ecc428e0ca6aa27d0facf27b4b18177b66ce2d3a11098314b82a501": {
                "Name": "cvat_worker_quality_reports",
                "EndpointID": "1dc9119d4e866f820115256e6ca10f4672636e47c6b95dd523d0237058a917ce",
                "MacAddress": "02:42:ac:14:00:0b",
                "IPv4Address": "172.20.0.11/16",
                "IPv6Address": ""
            },
            "1ee31d876df5dad934b9e547fe7be7aa5595126fb0900a9d8324326993b74629": {
                "Name": "cvat_redis_inmem",
                "EndpointID": "49c705883e209cd53dc5d6f2d7d8dbb01e02080290c957715a8b3c29a54380c0",
                "MacAddress": "02:42:ac:14:00:03",
                "IPv4Address": "172.20.0.3/16",
                "IPv6Address": ""
            },
            "1f97fa4452e90dd1e42e333f9895512e634cf571938fb37c03167e768affb3ae": {
                "Name": "cvat_redis_ondisk",
                "EndpointID": "8f66daf35efe0d21e3a936880f5f03191eaa0aaa5c139eea247c223328c10154",
                "MacAddress": "02:42:ac:14:00:0c",
                "IPv4Address": "172.20.0.12/16",
                "IPv6Address": ""
            },
            "3aaddb1594700b8cfddf7b78486a8cf6069b9551e620278399d7d8dafa78f2e8": {
                "Name": "cvat_vector",
                "EndpointID": "ad5286197d27cb738d6129bc56d7d7bf00a86db8b750566337eedb2c9dc75b5f",
                "MacAddress": "02:42:ac:14:00:10",
                "IPv4Address": "172.20.0.16/16",
                "IPv6Address": ""
            },
            "79b6909e137e096a31d3c342d2ae4288034c1ec265619ec369a9fed4bcbede4a": {
                "Name": "cvat_worker_webhooks",
                "EndpointID": "a67d60ad0a85674f6d34719c65bc65c367c3dc664373f3cb3476efec482ecdb7",
                "MacAddress": "02:42:ac:14:00:07",
                "IPv4Address": "172.20.0.7/16",
                "IPv6Address": ""
            },
            "7b7705ab33b4b9ecefbf7433dad86c9124d2ac20bd7b27563ac77ba0c08dfcb8": {
                "Name": "cvat_worker_export",
                "EndpointID": "d59ca225aac811ca4031b6eadad303afe853a75632b5c8fab5e6cb9936b87d19",
                "MacAddress": "02:42:ac:14:00:11",
                "IPv4Address": "172.20.0.17/16",
                "IPv6Address": ""
            },
            "7d7ab4c091955f0221c2f23c10da70ae873accd2e5060f0eaee21fec86b51b6c": {
                "Name": "cvat_worker_analytics_reports",
                "EndpointID": "abd8d9e9597bf908fa153378e4c0fb963495fcdc252e22109e4e0d93d4586233",
                "MacAddress": "02:42:ac:14:00:0e",
                "IPv4Address": "172.20.0.14/16",
                "IPv6Address": ""
            },
            "8837531d11785191905c3b427b76b97183e997bc9b927236db534d9f4820d24c": {
                "Name": "cvat_worker_import",
                "EndpointID": "3384ad21702cce24ff1dc644afa3404c27ea414163a333184262ee5bb26dc489",
                "MacAddress": "02:42:ac:14:00:0d",
                "IPv4Address": "172.20.0.13/16",
                "IPv6Address": ""
            },
            "bee0b6fb6c38df556fe7d33ced26588d8d1c656aa1feb4eb644dc5b77e0dd991": {
                "Name": "cvat_utils",
                "EndpointID": "6041f517078a26950ad699afc8d7bbca039d9e9d2747af8cb400c13aeb167389",
                "MacAddress": "02:42:ac:14:00:0f",
                "IPv4Address": "172.20.0.15/16",
                "IPv6Address": ""
            },
            "c39936b1f67e0bd10eb500340af5ba64ef442f28de6265eebdd8d5f13cf1c6f4": {
                "Name": "cvat_ui",
                "EndpointID": "86b7cb4a310a336350411de6724a73725f0a7c46ca9ffc31bc9118075ea521b8",
                "MacAddress": "02:42:ac:14:00:09",
                "IPv4Address": "172.20.0.9/16",
                "IPv6Address": ""
            },
            "c811fe5f02e10d91f417543ca0dd4f0173767009637cf1d5d5416a1d69bf630c": {
                "Name": "cvat_opa",
                "EndpointID": "ebac9822cdc5e48f72696922194762d6f5db03096e9aad04dc33db6622af6d4d",
                "MacAddress": "02:42:ac:14:00:02",
                "IPv4Address": "172.20.0.2/16",
                "IPv6Address": ""
            },
            "c92241ccdbe86532eb6d8d02978ee8030f878dd7dc55af539ab52d2838458fcc": {
                "Name": "cvat_db",
                "EndpointID": "e7ddec00320463786a2284eb5b4f7e8218b1af90fc15dd0ee19e26bdf9e628ab",
                "MacAddress": "02:42:ac:14:00:05",
                "IPv4Address": "172.20.0.5/16",
                "IPv6Address": ""
            },
            "cf634993a05419d5c04c47bdfb9b0a3ac537496e43a934d0b9b65531d1d24dd0": {
                "Name": "cvat_worker_annotation",
                "EndpointID": "62645b66b483bf8d4d77eb3d279bd0ba54baf05f535766ea9e6cba26571676f8",
                "MacAddress": "02:42:ac:14:00:06",
                "IPv4Address": "172.20.0.6/16",
                "IPv6Address": ""
            },
            "dbd0f682028d396e0b6ef3c58bd9e49828a7953e782f23f5d1a21051a662af44": {
                "Name": "cvat_server",
                "EndpointID": "9e853ce337f028aa97d5bc3a7eadc81b43fef6c08eb252d1833a536e413e5f19",
                "MacAddress": "02:42:ac:14:00:0a",
                "IPv4Address": "172.20.0.10/16",
                "IPv6Address": ""
            },
            "e434d6fcc5a0fcba9ab5cfff8417419998774de7a0a0b7575f0e13a832d55d90": {
                "Name": "cvat_clickhouse",
                "EndpointID": "4cd4623d1f058c0e5141574c9717420493d3e6c568e07b0296e8cc3d8424f9b5",
                "MacAddress": "02:42:ac:14:00:04",
                "IPv4Address": "172.20.0.4/16",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {
            "com.docker.compose.network": "cvat",
            "com.docker.compose.project": "cvat",
            "com.docker.compose.version": "2.3.3"
        }
    }
]

I can get inside cvat_server to check the IP address and ports exposed, but I can't docker exec inside cvat_opa which doesn't seem to offer bash or another shell, or ls or ps to poke around.

docker exec -it cvat_opa  bash
OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "bash": executable file not found in $PATH: unknown

Expected Behavior

The cvat UI page I reach should present the opportunity to do something, like login as the superuser?

Possible Solution

I feel like there's something with the docker network that is behind the failure of opa to initialize.

Context

No response

Environment

Docker version 20.10.14, build a224086
Ubuntu 22.04
cvat develop
bsekachev commented 5 months ago

Hello,

{"level":"error","msg":"Bundle load failed: server replied with Internal Server Error","name":"cvat","plugin":"bundle","time":"2024-05-01T15:53:36Z"}

That may be a reason.

docker logs cvat_server?

iWoodsman commented 5 months ago

docker logs cvat_server mentions a missing file:

2024-05-02 02:54:05,807 DEBG 'uvicorn-1' stderr output:
[2024-05-02 02:54:05,806] ERROR django.request: Internal Server Error: /api/auth/rules
Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/asgiref/sync.py", line 534, in thread_handler
    raise exc_info[1]
  File "/opt/venv/lib/python3.10/site-packages/django/core/handlers/exception.py", line 42, in inner
    response = await get_response(request)
  File "/opt/venv/lib/python3.10/site-packages/django/core/handlers/base.py", line 253, in _get_response_async
    response = await wrapped_callback(
  File "/opt/venv/lib/python3.10/site-packages/asgiref/sync.py", line 479, in __call__
    ret: _R = await loop.run_in_executor(
  File "/opt/venv/lib/python3.10/site-packages/asgiref/current_thread_executor.py", line 40, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/venv/lib/python3.10/site-packages/asgiref/sync.py", line 538, in thread_handler
    return func(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/django/views/decorators/csrf.py", line 56, in wrapper_view
    return view_func(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/django/views/generic/base.py", line 104, in view
    return self.dispatch(request, *args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/rest_framework/views.py", line 509, in dispatch
    response = self.handle_exception(exc)
  File "/opt/venv/lib/python3.10/site-packages/rest_framework/views.py", line 469, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/opt/venv/lib/python3.10/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
    raise exc
  File "/opt/venv/lib/python3.10/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/home/django/cvat/apps/iam/views.py", line 127, in wrapper
    return patched_viewset_method(wsgi_request, *args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/django/views/decorators/http.py", line 98, in inner
    res_etag = etag_func(request, *args, **kwargs) if etag_func else None
  File "/home/django/cvat/apps/iam/views.py", line 146, in <lambda>
    @_etag(lambda _: RulesView._etag_func(RulesView._get_bundle_path()))
  File "/home/django/cvat/apps/iam/views.py", line 143, in _etag_func
    with open(file_path, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/django/static/opa/bundle.tar.gz'

2024-05-02 02:54:05,808 DEBG 'uvicorn-1' stdout output:
INFO:     172.20.0.2:0 - "GET /api/auth/rules HTTP/1.0" 500 Internal Server Error
iWoodsman commented 5 months ago

I enter the cvat_server container with docker exec -it cvat_server bash I discover that the path /home/django/static/opa/ exists but it is an empty directory, with no bundle.tar.gz there. What could cause that?

sigma-libra commented 5 months ago

Chiming in to add, we have the exact same issue, both with an upgrade from a previous version and installing a fresh version.

sigma-libra commented 5 months ago

Found a solution that works for me.

It appears this issue has happened before, but only to Mac users. The solution for that error, "Error 2", is here: https://github.com/cvat-ai/cvat/issues/6629#issuecomment-2080420315.

Note that 15 seconds was not enough for me, a good 60 second head start for opa did the trick for me, along with deleting the cvat_server image and pulling it again.

iWoodsman commented 5 months ago

I tried it but no luck. I was able to start cvat_opa by itself, and then the other containers, but the same errors were logged about a missing bundle.tar.gz, and the same stall and error when loading the login page. I'd be interested if you noticed that after your successful startup, a tar file did actually appear in /home/django/static/opa/. Is it written there and if so by what? Or is that directory a mount point from cvat_opa that isn't showing up because of a docker network problem underneath the whole mess?

iWoodsman commented 5 months ago

Whoa. I installed cvat on my Intel Mac and it worked immediately. I ducked inside the cvat_server container and checked the path to the missing tar file and the opa path doesn't even exist.

Linux:
django@c05f1e4faba8:~$ ll /home/django/static/
drwxr-xr-x 1 django django  86 May  3 00:48 ./
drwxr-x--- 1 django django 288 Mar 22 08:42 ../
drwxr-xr-x 1 django django  16 May  3 00:48 admin/
drwxr-xr-x 1 django django   0 May  3 00:48 opa/
drwxr-xr-x 1 django django  34 May  3 00:48 rest_framework/
drwxr-xr-x 1 django django 148 May  3 00:48 social_authentication/

Mac:
drwxr-xr-x 1 django django 4096 May  3 01:20 ./
drwxr-x--- 1 django django 4096 May  2 16:06 ../
drwxr-xr-x 5 django django 4096 May  3 01:20 admin/
drwxr-xr-x 7 django django 4096 May  3 01:20 rest_framework/
drwxr-xr-x 2 django django 4096 May  3 01:20 social_authentication/

So the Mac seems untroubled by the absence of the path and file that Linux is insisting it needs. I am so confused.

bsekachev commented 5 months ago

@SpecLad

May it be related with recent changes in OPA rules?

SpecLad commented 5 months ago

I suspect the issue here is a mismatch between the version of the Compose file that was used and the Docker images. I recently made a change that updated the code and the Compose file in tandem. The symptoms here are what I would expect if you used the new Compose file with old images.

Basically, add --pull=always to your docker compose up command, and that should fix it. Or run from the master branch.

I discover that the path /home/django/static/opa/ exists but it is an empty directory, with no bundle.tar.gz there. What could cause that?

For the record, the new versions of CVAT no longer store the bundle on disk. So you should expect to see nothing there.

iWoodsman commented 5 months ago

After a docker compose pull, I was able to proceed and the site is now working as expected. Thank you!

boldyshev commented 5 months ago

Same for me, thanks!