Open ganindu7 opened 1 year ago
The first computer name: Dell-8GPU, which contains 8*3090ti. The second computer name: Dell-CPU, which is i7-13th. I'm looking for a way: Dell-CPU:severless cvat Dell-8GPU:SAM If you have any idea, please contact me. Thank you very much! ! !
hi, open it: https://nuclio.io/docs/latest/reference/triggers/http/#attributes (Kubernetes only) Kubernetes ServiceType, used by the Kubernetes service to expose the trigger. The default ServiceType is ClusterIP, which means that by default the trigger won't be exposed outside of the cluster unless you configure a proper ingress or manually change the ServiceType to NodePort. Is it effective?
I think these issues are helpful here. https://github.com/opencv/cvat/issues/2301 https://github.com/opencv/cvat/issues/6065 @bsekachev Can you please advice us on this.
once I modify these (CVAT_NUCLIO_HOST
and CVAT_NUCLIO_PORT
) env variables
NUCLIO = {
'SCHEME': os.getenv('CVAT_NUCLIO_SCHEME', 'http'),
'HOST': os.getenv('CVAT_NUCLIO_HOST', 'aisrv.gnet.lan'),
'PORT': int(os.getenv('CVAT_NUCLIO_PORT', 30936)),
'DEFAULT_TIMEOUT': int(os.getenv('CVAT_NUCLIO_DEFAULT_TIMEOUT', 120)),
'FUNCTION_NAMESPACE': os.getenv('CVAT_NUCLIO_FUNCTION_NAMESPACE', 'nuclio'),
'INVOKE_METHOD': os.getenv('CVAT_NUCLIO_INVOKE_METHOD',
default='dashboard' if 'KUBERNETES_SERVICE_HOST' in os.environ else 'direct'),
}
do I need to have these in a seperate yaml ? (e.g. the-other-compose-file.yaml
)
cvat_server:
environment:
CVAT_SERVERLESS: 1
extra_hosts:
- "host.docker.internal:host-gateway"
cvat_worker_annotation:
extra_hosts:
- "host.docker.internal:host-gateway"
or declare in the envs
services:
cvat_server:
environment:
CVAT_SERVERLESS: 1
CVAT_NUCLIO_SCHEME: http # Updated value
CVAT_NUCLIO_HOST: aisrv.gnet.lan # Updated value
CVAT_NUCLIO_PORT: 30936 # Updated value
KUBERNETES_SERVICE_HOST: true
extra_hosts:
- "host.docker.internal:host-gateway"
cvat_worker_annotation:
extra_hosts:
- "host.docker.internal:host-gateway"
~
and then run it as
docker compose -f docker-compose.yml -f docker-compose.override.yml -f components/serverless/the-other-compose-file.yaml up --build -d
even after deploying like that I get no models :( I think there must be something I'm doing off the specification
cvat not regstering models
working functions
k8 services and pods
hi, open it: https://nuclio.io/docs/latest/reference/triggers/http/#attributes (Kubernetes only) Kubernetes ServiceType, used by the Kubernetes service to expose the trigger. The default ServiceType is ClusterIP, which means that by default the trigger won't be exposed outside of the cluster unless you configure a proper ingress or manually change the ServiceType to NodePort. Is it effective?
yes it is nodeport
does that mean I have to specify individual functions rather than the nuclio dashboard? (all this time i was putting my k8 nuclio dashboard URL and port for CVAT_NUCLIO_HOST
and CVAT_NUCLIO_PORT
This is my nuclio dashboard from the k8 cluster here are my pods and services
just reiterating (cvat is running on a seperate pc running docker! I have exec'd into the django docker pod and made sure name resolution and ping for k8 services are working with nslookup and ping)
finally I was able to make it work! I'm not sure where exactly was the problem was but here is what I did
my values yaml file values.yaml.txt
services:
cvat_server:
environment:
CVAT_SERVERLESS: 1
CVAT_NUCLIO_SCHEME: http # Updated value
CVAT_NUCLIO_HOST: aisrv.gnet.lan # Updated value
CVAT_NUCLIO_PORT: 30936 # Updated value
KUBERNETES_SERVICE_HOST: true
CVAT_NUCLIO_FUNCTION_NAMESPACE: nuclio
volumes:
- cvat_data:/home/django/data:rw
extra_hosts:
- "host.docker.internal:host-gateway"
cvat_worker_annotation:
extra_hosts:
- "host.docker.internal:host-gateway"
my updates to cvat server.
I understand that because I am not using the docker dashboard I may not need to use that specific version but at this point I just wanted things to work as the cvat team may have tested with the shipped version.
(also remember to use docker-buildx)
This came back in release v2.7.6 again. despite me updating nuctl / nuclio images to 1.11.24 in all places (dashboard/, controller and nuctl)
this issue us very similar to #6582
I commented with my temporary hack fix
I had the same problem after updating cvat.
My difference is cvat running from a docker container and nuclio running from a kubernetes cluster.
my deployment looks like this
(TAOPY) g@nvdgx:~/Workspace/sandbox/nuclio-serverless-sandbox/ganindu-tests$ ./deploy.sh nozzlenet_1/ 23.10.17 10:17:33.441 nuctl (I) Project created {"Name": "cvat", "Namespace": "nuclio"} Deploying . function... 23.10.17 10:17:33.561 nuctl (I) Deploying function {"name": "test-nuctl-function-nozzlenet-1"} 23.10.17 10:17:33.566 nuctl (I) Building {"builderKind": "docker", "versionInfo": "Label: 1.11.24, Git commit: f2a3900d23b92fd3639dc9cb765044ef53a4fb2b, OS: linux, Arch: amd64, Go version: go1.19.10", "name": "test-nuctl-function-nozzlenet-1"} 23.10.17 10:17:33.650 nuctl (I) Staging files and preparing base images 23.10.17 10:17:33.678 nuctl (I) Building processor image {"registryURL": "172.16.3.2:5000", "taggedImageName": "nozzlenet-nuclio-v1:latest"} 23.10.17 10:17:33.678 nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/handler-builder-python-onbuild:1.11.24-amd64"} 23.10.17 10:17:35.818 nuctl.platform (I) Building docker image {"image": "nozzlenet-nuclio-v1:latest"} 23.10.17 10:17:41.437 nuctl.platform (I) Pushing docker image into registry {"image": "nozzlenet-nuclio-v1:latest", "registry": "172.16.3.2:5000"} 23.10.17 10:17:41.437 nuctl.platform.docker (I) Pushing image {"from": "nozzlenet-nuclio-v1:latest", "to": "172.16.3.2:5000/nozzlenet-nuclio-v1:latest"} 23.10.17 10:17:42.850 nuctl.platform (I) Docker image was successfully built and pushed into docker registry {"image": "nozzlenet-nuclio-v1:latest"} 23.10.17 10:17:42.850 nuctl (I) Build complete {"image": "nozzlenet-nuclio-v1:latest"} 23.10.17 10:17:50.882 nuctl (I) Function deploy complete {"functionName": "test-nuctl-function-nozzlenet-1", "httpPort": 30555, "internalInvocationURLs": ["nuclio-test-nuctl-function-nozzlenet-1.nuclio.svc.cluster.local:8080"], "externalInvocationURLs": [":30555"]} 23.10.17 10:17:50.888 nuctl.platform.updater (I) Updating function {"name": "test-nuctl-function-nozzlenet-1"} 23.10.17 10:17:51.166 nuctl.platform.updater (I) Function updated {"functionName": "test-nuctl-function-nozzlenet-1"} NAMESPACE | NAME | PROJECT | STATE | REPLICAS | NODE PORT nuclio | test-nuctl-function-nozzlenet-1 | cvat | ready | 1/1 | 30555
as you can see mine uses ther nodepoprt
30555
my `docker-compose-override.yaml' used to look like this
services: cvat_server: environment: CVAT_SERVERLESS: 1 CVAT_NUCLIO_SCHEME: "http" # Updated value CVAT_NUCLIO_HOST: "aisrv.gnet.lan" # Updated value CVAT_NUCLIO_PORT: 30936 # Updated value KUBERNETES_SERVICE_HOST: "true" CVAT_NUCLIO_FUNCTION_NAMESPACE: "nuclio" volumes: - cvat_data:/home/django/data:rw extra_hosts: - "host.docker.internal:host-gateway" cvat_worker_annotation: extra_hosts: - "host.docker.internal:host-gateway" volumes: cvat_data: driver_opts: type: none device: /mnt/cvat_data o: bind
and the error I was getting was (I will put only a part of it for brevity)
2023-10-17 10:45:54,639 DEBG 'rqworker-annotation-0' stderr output: [2023-10-17 10:45:54,639] ERROR rq.worker: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn conn = connection.create_connection( File "/opt/venv/lib/python3.10/site-packages/urllib3/util/connection.py", line 72, in create_connection for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): File "/usr/lib/python3.10/socket.py", line 955, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -2] Name or service not known During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 714, in urlopen httplib_response = self._make_request( File "/opt/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 415, in _make_request conn.request(method, url, **httplib_request_kw) File "/opt/venv/lib/python3.10/site-packages/urllib3/connection.py", line 244, in request super(HTTPConnection, self).request(method, url, body=body, headers=headers) File "/usr/lib/python3.10/http/client.py", line 1283, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib/python3.10/http/client.py", line 1329, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib/python3.10/http/client.py", line 1278, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib/python3.10/http/client.py", line 1038, in _send_output self.send(msg) File "/usr/lib/python3.10/http/client.py", line 976, in send self.connect() File "/opt/venv/lib/python3.10/site-packages/urllib3/connection.py", line 205, in connect conn = self._new_conn() File "/opt/venv/lib/python3.10/site-packages/urllib3/connection.py", line 186, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f20c8f38b80>: Failed to establish a new connection: [Errno -2] Name or service not known During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( File "/opt/venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 798, in urlopen retries = retries.increment( File "/opt/venv/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='host.docker.internal', port=30555): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f20c8f38b80>: Failed to establish a new connection: [Errno -2] Name or service not known'))
the intersting bit was where it seem to think the nodeport service was hosted in the docker host port 30555
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='host.docker.internal', port=30555): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f20c8f38b80>: Failed to establish a new connection: [Errno -2] Name or service not known'))
so in a very dodgy way I modified the
docker-compse.override.yaml
file to (which I know is wrong)cvat_worker_annotation: extra_hosts: - "host.docker.internal:172.16.1.19"
172.16.1.19 is the ip address of my k8 control plane
and this partially fixed the issue (now I can automatically anotate jobs/projects) which I was not able to previously due to the error above (but it still does not work for individual images, times out to error 500)
I'm not a docker power user I just think the
fix
worked only because of some other potential error I made somewhere. can you please help me point out where could the original problem be.Thanks
can you please suggest a better solutiuon (better than me misconfiguring the docker compose) . (I think my scenario is a combination of two wrongs now acting as as sort of a solution which may only work in a trusted local network setup like mine)
Setting KUBERNETES_SERVICE_HOST
will result in the INVOKE_METHOD
being set to true for Nuclio which might have been the reason why issue was resolved.
This may interest you: https://github.com/cvat-ai/cvat/issues/6797#issuecomment-2272616912
I was encountering similar issues~
My actions before raising this issue
Background
I have a cvat docker instance running on a server locally (this server does not belong to a local k8 cluter).
I have run the serverless compose file on the cvat repo, therefore at port 8070 I can see the nuclio dashboard.
within cvat (port 8080) models tab I can see the serverless functon.
This is all working fine and I can do auto annotation without a problem for the function hosted in the doocker server.
However my cvat server doe not have a GPU so I can't run gpu serverless functions on that particular server. Luckily in the same local network there is a small kubnernetes cluster that has a handful of nodes where one of them happens to be a GPU node.
I installed nuclio on my local k8 cluster and I was able to sucessfully run serverless functions.
Then I verfied that my serverless functions are working properly utilising the gpu resources avaialble and tested wth a test web app written ro run on my pc (sending an image, recieving annotations, plotting and listing the returned annotations)
Note: My PC running the function testing web app is not in the k8 cluster but in the same local network (I exposed the service(serverless function) as a nodeport so it can be accessed from outsuse the k8 cluster but within the same local network where my pc (and the cvat server) is in.
My aim, problem and possible solutions
Use the serverless function I created (in the k8 cluster) with the nuclio dashboard that is on the docker container (I doubt this is possible because the gpu operator is running on the k8 cluster and serverless functions are acting as K8 services (in k8 pods)
or use the function as a URL (from my internet research this is a paid feature and can only be used with
cvat.ai
, this doesn;t suit me because I want to use everything locally (at least for now until I flesh out things))or get CVAT to use my k8 nuclio dashboard (I don't understand this well so thi might be illogical) instead the dashboard from the severless docker compose file (I think this might be the most plausible if it make sense at all)
Can you pelase help me on this,
Thanks, Ganindu.