SthPhoenix / InsightFace-REST

InsightFace REST API for easy deployment of face recognition services with TensorRT in Docker.
Apache License 2.0
488 stars 118 forks source link

Gpu Quatro RTX 5000 error #119

Open MyraBaba opened 11 months ago

MyraBaba commented 11 months ago

Hi,

I have below error in our lenova laptop which has Quatro RTX 5000 nvidia GPU:

Starting 1 workers on 1 GPUs (1 workers per GPU) Containers port range: 18081 - 18081 insightface-rest-gpu0-trt --- Starting container insightface-rest-gpu0-trt with "device=0" at port 18081 Preparing models... [14:40:15] INFO - Preparing 'scrfd_10g_gnkps' model... [10/13/2023-14:40:15] [TRT] [W] Unable to determine GPU memory usage [10/13/2023-14:40:15] [TRT] [W] Unable to determine GPU memory usage [10/13/2023-14:40:15] [TRT] [W] CUDA initialization failure with error: 999. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html Traceback (most recent call last): File "/app/prepare_models.py", line 53, in prepare_models() File "/app/prepare_models.py", line 45, in prepare_models prepare_backend(model_name=model, backend_name=settings.models.inference_backend, im_size=max_size, File "/app/modules/model_zoo/getter.py", line 137, in prepare_backend has_fp16 = check_fp16() File "/app/modules/converters/onnx_to_trt.py", line 66, in check_fp16 builder = trt.Builder(TRT_LOGGER) TypeError: pybind11::init(): factory function returned nullptr Starting InsightFace-REST using 1 workers. [2023-10-13 14:40:15 +0000] [1] [INFO] Starting gunicorn 21.2.0 [2023-10-13 14:40:15 +0000] [1] [INFO] Listening at: http://0.0.0.0:18080 (1) [2023-10-13 14:40:15 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker [2023-10-13 14:40:15 +0000] [41] [INFO] Booting worker with pid: 41 [10/13/2023-14:40:16] [TRT] [W] Unable to determine GPU memory usage [10/13/2023-14:40:16] [TRT] [W] Unable to determine GPU memory usage [10/13/2023-14:40:16] [TRT] [W] CUDA initialization failure with error: 999. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html [2023-10-13 14:40:16 +0000] [41] [ERROR] Exception in worker process Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/gunicorn/arbiter.py", line 609, in spawn_worker worker.init_process() File "/usr/local/lib/python3.10/dist-packages/uvicorn/workers.py", line 66, in init_process super(UvicornWorker, self).init_process() File "/usr/local/lib/python3.10/dist-packages/gunicorn/workers/base.py", line 134, in init_process self.load_wsgi() File "/usr/local/lib/python3.10/dist-packages/gunicorn/workers/base.py", line 146, in load_wsgi self.wsgi = self.app.wsgi() File "/usr/local/lib/python3.10/dist-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() File "/usr/local/lib/python3.10/dist-packages/gunicorn/app/wsgiapp.py", line 58, in load return self.load_wsgiapp() File "/usr/local/lib/python3.10/dist-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp return util.import_app(self.app_uri) File "/usr/local/lib/python3.10/dist-packages/gunicorn/util.py", line 371, in import_app mod = importlib.import_module(module) File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/app/app.py", line 34, in processing = Processing(det_name=settings.models.det_name, rec_name=settings.models.rec_name, File "/app/modules/processing.py", line 32, in init self.model = FaceAnalysis(det_name=det_name, File "/app/modules/face_model.py", line 98, in init self.det_model = Detector(det_name=det_name, max_size=self.max_size, File "/app/modules/face_model.py", line 56, in init self.retina = get_model(det_name, backend_name=backend_name, force_fp16=force_fp16, im_size=max_size, File "/app/modules/model_zoo/getter.py", line 213, in get_model model_path = prepare_backend(model_name, backend_name, im_size=im_size, max_batch_size=max_batch_size, File "/app/modules/model_zoo/getter.py", line 137, in prepare_backend has_fp16 = check_fp16() File "/app/modules/converters/onnx_to_trt.py", line 66, in check_fp16 builder = trt.Builder(TRT_LOGGER) TypeError: pybind11::init(): factory function returned nullptr [2023-10-13 14:40:16 +0000] [41] [INFO] Worker exiting (pid: 41) [2023-10-13 14:40:16 +0000] [1] [ERROR] Worker (pid:41) exited with code 3 [2023-10-13 14:40:16 +0000] [1] [ERROR] Shutting down: Master [2023-10-13 14:40:16 +0000] [1] [ERROR] Reason: Worker failed to boot.

2 - Can we use the repo in ubuntu 18.04 and | NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 |

SthPhoenix commented 9 months ago

This error seems to be a result of some driver misconfiguration\mismatch. I haven't tested this repo on Ubuntu prior to 20.04, so I can't guarantee it would work.