Open Dvalin21 opened 1 year ago
I also see this exact issue. From compreface-core log:
compreface-core | Traceback (most recent call last): compreface-core | File "/usr/local/lib/python3.8/dist-packages/mxnet/symbol/symbol.py", line 1903, in simple_bind compreface-core | check_call(_LIB.MXExecutorSimpleBindEx(self.handle, compreface-core | File "/usr/local/lib/python3.8/dist-packages/mxnet/base.py", line 246, in check_call compreface-core | raise get_last_ffi_error() compreface-core | mxnet.base.MXNetError: Traceback (most recent call last): compreface-core | File "/work/mxnet/src/storage/storage.cc", line 97 compreface-core | CUDA: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: no CUDA-capable device is detected compreface-core | compreface-core | During handling of the above exception, another exception occurred: compreface-core | compreface-core | Traceback (most recent call last): compreface-core | File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2447, in wsgi_app compreface-core | response = self.full_dispatch_request() compreface-core | File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1945, in full_dispatch_request compreface-core | self.try_trigger_before_first_request_functions() compreface-core | File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1993, in try_trigger_before_first_request_functions compreface-core | func() compreface-core | File "/app/ml/./src/_endpoints.py", line 52, in init_model compreface-core | detector( compreface-core | File "/app/ml/./src/services/facescan/plugins/mixins.py", line 46, in call compreface-core | faces = self._fetch_faces(img, det_prob_threshold) compreface-core | File "/app/ml/./src/services/facescan/plugins/mixins.py", line 53, in _fetch_faces compreface-core | boxes = self.find_faces(img, det_prob_threshold) compreface-core | File "/app/ml/./src/services/facescan/plugins/insightface/insightface.py", line 103, in find_faces compreface-core | model = self._detection_model compreface-core | File "/usr/local/lib/python3.8/dist-packages/cached_property.py", line 36, in get compreface-core | value = obj.dict[self.func.name] = self.func(obj) compreface-core | File "/app/ml/./src/services/facescan/plugins/insightface/insightface.py", line 80, in _detection_model compreface-core | model.prepare(ctx_id=self._CTX_ID, nms=self._NMS) compreface-core | File "/usr/local/lib/python3.8/dist-packages/insightface/app/face_analysis.py", line 32, in prepare compreface-core | self.det_model.prepare(ctx_id, nms) compreface-core | File "/usr/local/lib/python3.8/dist-packages/insightface/model_zoo/face_detection.py", line 217, in prepare compreface-core | model.bind(data_shapes=[('data', data_shape)]) compreface-core | File "/usr/local/lib/python3.8/dist-packages/mxnet/module/module.py", line 422, in bind compreface-core | self._exec_group = DataParallelExecutorGroup(self._symbol, self._context, compreface-core | File "/usr/local/lib/python3.8/dist-packages/mxnet/module/executor_group.py", line 280, in init compreface-core | self.bind_exec(data_shapes, label_shapes, shared_group) compreface-core | File "/usr/local/lib/python3.8/dist-packages/mxnet/module/executor_group.py", line 383, in bind_exec compreface-core | self.execs.append(self._bind_ith_exec(i, data_shapes_i, label_shapes_i, compreface-core | File "/usr/local/lib/python3.8/dist-packages/mxnet/module/executor_group.py", line 675, in _bind_ith_exec compreface-core | executor = self.symbol.simple_bind(ctx=context, grad_req=self.grad_req, compreface-core | File "/usr/local/lib/python3.8/dist-packages/mxnet/symbol/symbol.py", line 1944, in simple_bind compreface-core | raise RuntimeError(error_msg) compreface-core | RuntimeError: simple_bind error. Arguments: compreface-core | data: (1, 3, 480, 640) compreface-core | Traceback (most recent call last): compreface-core | File "/work/mxnet/src/storage/storage.cc", line 97 compreface-core | CUDA: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: no CUDA-capable device is detected compreface-core | {"severity": "WARNING", "message": "500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.", "request": {"method": "GET", "path": "/status", "filename": "", "api_key": "", "remote_addr": "172.18.0.4"}, "logger": "root", "module": "error_handling", "traceback": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.8/dist-packages/mxnet/symbol/symbol.py\", line 1903, in simple_bind\n check_call(_LIB.MXExecutorSimpleBindEx(self.handle,\n File \"/usr/local/lib/python3.8/dist-packages/mxnet/base.py\", line 246, in check_call\n raise get_last_ffi_error()\nmxnet.base.MXNetError: Traceback (most recent call last):\n File \"/work/mxnet/src/storage/storage.cc\", line 97\nCUDA: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: no CUDA-capable device is detected\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/local/lib/python3.8/dist-packages/flask/app.py\", line 2447, in wsgi_app\n response = self.full_dispatch_request()\n File \"/usr/local/lib/python3.8/dist-packages/flask/app.py\", line 1945, in full_dispatch_request\n self.try_trigger_before_first_request_functions()\n File \"/usr/local/lib/python3.8/dist-packages/flask/app.py\", line 1993, in try_trigger_before_first_request_functions\n func()\n File \"/app/ml/./src/_endpoints.py\", line 52, in init_model\n detector(\n File \"/app/ml/./src/services/facescan/plugins/mixins.py\", line 46, in call\n faces = self._fetch_faces(img, det_prob_threshold)\n File \"/app/ml/./src/services/facescan/plugins/mixins.py\", line 53, in _fetch_faces\n boxes = self.find_faces(img, det_prob_threshold)\n File \"/app/ml/./src/services/facescan/plugins/insightface/insightface.py\", line 103, in find_faces\n model = self._detection_model\n File \"/usr/local/lib/python3.8/dist-packages/cached_property.py\", line 36, in get\n value = obj.dict[self.func.name] = self.func(obj)\n File \"/app/ml/./src/services/facescan/plugins/insightface/insightface.py\", line 80, in _detection_model\n model.prepare(ctx_id=self._CTX_ID, nms=self._NMS)\n File \"/usr/local/lib/python3.8/dist-packages/insightface/app/face_analysis.py\", line 32, in prepare\n self.det_model.prepare(ctx_id, nms)\n File \"/usr/local/lib/python3.8/dist-packages/insightface/model_zoo/face_detection.py\", line 217, in prepare\n model.bind(data_shapes=[('data', data_shape)])\n File \"/usr/local/lib/python3.8/dist-packages/mxnet/module/module.py\", line 422, in bind\n self._exec_group = DataParallelExecutorGroup(self._symbol, self._context,\n File \"/usr/local/lib/python3.8/dist-packages/mxnet/module/executor_group.py\", line 280, in init\n self.bind_exec(data_shapes, label_shapes, shared_group)\n File \"/usr/local/lib/python3.8/dist-packages/mxnet/module/executor_group.py\", line 383, in bind_exec\n self.execs.append(self._bind_ith_exec(i, data_shapes_i, label_shapes_i,\n File \"/usr/local/lib/python3.8/dist-packages/mxnet/module/executor_group.py\", line 675, in _bind_ith_exec\n executor = self.symbol.simple_bind(ctx=context, grad_req=self.grad_req,\n File \"/usr/local/lib/python3.8/dist-packages/mxnet/symbol/symbol.py\", line 1944, in simple_bind\n raise RuntimeError(error_msg)\nRuntimeError: simple_bind error. Arguments:\ndata: (1, 3, 480, 640)\nTraceback (most recent call last):\n File \"/work/mxnet/src/storage/storage.cc\", line 97\nCUDA: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: no CUDA-capable device is detected\n", "build_version": "dev"}
The core-api log shows this exception:
com.exadel.frs.commonservice.sdk.faces.exception.FacesServiceException: Error during synchronization between servers: [500 INTERNAL SERVER ERROR] during [GET] to [http://compreface-core:3000/status] [FacesFeignClient#getStatus()]: [{"message":"500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application."}
compreface-api | ]
compreface-api | at com.exadel.frs.commonservice.sdk.faces.service.FacesRestApiClient.getStatus(FacesRestApiClient.java:101)
compreface-api | at com.exadel.frs.commonservice.sdk.faces.service.FacesRestApiClient$$FastClassBySpringCGLIB$$517e8caf.invoke(
500 Internal Server
After running compreface for several weeks, it just stops connecting. Admin node starts, core and api stays at "loading"
Desktop (please complete the following information):
Pastbin with logs: https://pastebin.com/H0FvXkeX
Run those commands and attach result to the ticket:
docker ps
docker-compose logs