SthPhoenix / InsightFace-REST

InsightFace REST API for easy deployment of face recognition services with TensorRT in Docker.
Apache License 2.0
501 stars 117 forks source link

Fails to start on CPU with error: parsing message with type 'ONNX_REL_1_8.ModelProto' #99

Open spacemolly opened 2 years ago

spacemolly commented 2 years ago

I have an issue running a cpu deployment as of recent versions. I've had had this running before without issues but building a cpu docker image using the defaults (scrfd_2.5g_gnkps, glintr100) now fails to start. I get an error message saying: Error parsing message with type 'ONNX_REL_1_8.ModelProto'.

Full logs:

Preparing models... [14:38:10] INFO - Preparing 'scrfd_2.5g_gnkps' model... [14:38:10] INFO - Reshaping ONNX inputs to: (1, 3, 640, 640) [14:38:10] INFO - 'scrfd_2.5g_gnkps' model ready! [14:38:10] INFO - Preparing 'glintr100' model... No module named 'cupy' Traceback (most recent call last): File "prepare_models.py", line 52, in prepare_models() File "prepare_models.py", line 44, in prepare_models prepare_backend(model_name=model, backend_name=env_configs.models.backend_name, im_size=max_size, File "/app/modules/model_zoo/getter.py", line 125, in prepare_backend model = onnx.load(onnx_path) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 119, in load_model model = load_model_from_string(s, format=format) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 156, in load_model_from_string return _deserialize(s, ModelProto()) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 97, in _deserialize decoded = cast(Optional[int], proto.ParseFromString(s)) google.protobuf.message.DecodeError: Error parsing message with type 'ONNX_REL_1_8.ModelProto' Starting InsightFace-REST using 2 workers. [2022-09-24 14:38:11 +0000] [11] [INFO] Starting gunicorn 20.1.0 [2022-09-24 14:38:11 +0000] [11] [INFO] Listening at: http://0.0.0.0:18080 (11) [2022-09-24 14:38:11 +0000] [11] [INFO] Using worker: uvicorn.workers.UvicornWorker [2022-09-24 14:38:11 +0000] [13] [INFO] Booting worker with pid: 13 [2022-09-24 14:38:11 +0000] [14] [INFO] Booting worker with pid: 14 [14:38:12] INFO - Reshaping ONNX inputs to: (1, 3, 640, 640) [14:38:12] INFO - Reshaping ONNX inputs to: (1, 3, 640, 640) [14:38:12] INFO - Detector started [14:38:12] INFO - Warming up face detection ONNX Runtime engine... [14:38:12] INFO - Detector started [14:38:12] INFO - Warming up face detection ONNX Runtime engine... No module named 'cupy' [2022-09-24 14:38:12 +0000] [13] [ERROR] Exception in worker process Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker worker.init_process() File "/usr/local/lib/python3.8/site-packages/uvicorn/workers.py", line 66, in init_process super(UvicornWorker, self).init_process() File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process self.load_wsgi() File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi self.wsgi = self.app.wsgi() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load return self.load_wsgiapp() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp return util.import_app(self.app_uri) File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app mod = importlib.import_module(module) File "/usr/local/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 843, in exec_module File "", line 219, in _call_with_frames_removed File "/app/app.py", line 36, in processing = Processing(det_name=configs.models.det_name, rec_name=configs.models.rec_name, File "/app/modules/processing.py", line 82, in init self.model = FaceAnalysis(det_name=det_name, rec_name=rec_name, ga_name=ga_name, File "/app/modules/face_model.py", line 86, in init self.rec_model = get_model(rec_name, backend_name=backend_name, force_fp16=force_fp16, File "/app/modules/model_zoo/getter.py", line 206, in get_model model_path = prepare_backend(model_name, backend_name, im_size=im_size, max_batch_size=max_batch_size, File "/app/modules/model_zoo/getter.py", line 125, in prepare_backend model = onnx.load(onnx_path) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 119, in load_model model = load_model_from_string(s, format=format) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 156, in load_model_from_string return _deserialize(s, ModelProto()) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 97, in _deserialize decoded = cast(Optional[int], proto.ParseFromString(s)) google.protobuf.message.DecodeError: Error parsing message with type 'ONNX_REL_1_8.ModelProto' [2022-09-24 14:38:12 +0000] [13] [INFO] Worker exiting (pid: 13) [2022-09-24 14:38:12 +0000] [14] [ERROR] Exception in worker process Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker worker.init_process() File "/usr/local/lib/python3.8/site-packages/uvicorn/workers.py", line 66, in init_process super(UvicornWorker, self).init_process() File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process self.load_wsgi() File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi self.wsgi = self.app.wsgi() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load return self.load_wsgiapp() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp return util.import_app(self.app_uri) File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app mod = importlib.import_module(module) File "/usr/local/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 843, in exec_module File "", line 219, in _call_with_frames_removed File "/app/app.py", line 36, in processing = Processing(det_name=configs.models.det_name, rec_name=configs.models.rec_name, File "/app/modules/processing.py", line 82, in init self.model = FaceAnalysis(det_name=det_name, rec_name=rec_name, ga_name=ga_name, File "/app/modules/face_model.py", line 86, in init self.rec_model = get_model(rec_name, backend_name=backend_name, force_fp16=force_fp16, File "/app/modules/model_zoo/getter.py", line 206, in get_model model_path = prepare_backend(model_name, backend_name, im_size=im_size, max_batch_size=max_batch_size, File "/app/modules/model_zoo/getter.py", line 125, in prepare_backend model = onnx.load(onnx_path) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 119, in load_model model = load_model_from_string(s, format=format) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 156, in load_model_from_string return _deserialize(s, ModelProto()) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 97, in _deserialize decoded = cast(Optional[int], proto.ParseFromString(s)) google.protobuf.message.DecodeError: Error parsing message with type 'ONNX_REL_1_8.ModelProto' No module named 'cupy' [2022-09-24 14:38:12 +0000] [14] [INFO] Worker exiting (pid: 14) [2022-09-24 14:38:12 +0000] [11] [WARNING] Worker with pid 14 was terminated due to signal 15 [2022-09-24 14:38:13 +0000] [11] [INFO] Shutting down: Master [2022-09-24 14:38:13 +0000] [11] [INFO] Reason: Worker failed to boot.

SthPhoenix commented 2 years ago

Hi! That's strange I have just build image from scratch without cache and pulled latest python:3.8-slim base image - everything works as expected.

SthPhoenix commented 2 years ago

Have you managed to figure out the issue? I still can't reproduce it.

jinzaz commented 2 years ago

Hello, have you solved this problem? I have encountered the same problem. If so, I hope to see your reply

SthPhoenix commented 2 years ago

I wasn't able to reproduce this bug, @jinzaz could you share more info about your environment? Are you running service in docker? Could you compute md5 sums of onnx files? It might be that model files were broken during download.

P.S. I have just tried building it again without cache and everything still works as expected. More details are required to figure out your issue.

P.P.S I have just committed update to model configs, containing md5 check sums, in case model was broken during download it should be downloaded again on next launch.

SthPhoenix commented 2 years ago

Proper md5 sum for glintr100 model should be 3b366b98f786426f79629ddb2e56629c, in case you got different check sum you just need to re-download the model, with latest commit it'll be downloaded again automaticaly upon start.

jinzaz commented 2 years ago

@SthPhoenix I don't know if there are no models at all. I use deploy_ cpu. Sh Run and build the docker, but there is no onnx file in the built models folder. Do you need to download the model yourself

jinzaz commented 2 years ago

@SthPhoenix this is my error code logs info:

entrypoint.sh: line 2: $'\r': command not found Preparing models... ': [Errno 2] No such file or directorys.py entrypoint.sh: line 5: $'\r': command not found Starting InsightFace-REST using 1 workers. [2022-10-30 10:26:17 +0000] [1] [INFO] Starting gunicorn 20.1.0 [2022-10-30 10:26:17 +0000] [1] [INFO] Listening at: http://0.0.0.0:18080 (1) [2022-10-30 10:26:17 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker [2022-10-30 10:26:17 +0000] [13] [INFO] Booting worker with pid: 13 [2022-10-30 10:26:19 +0000] [13] [ERROR] Exception in worker process Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker worker.init_process() File "/usr/local/lib/python3.8/site-packages/uvicorn/workers.py", line 66, in init_process super(UvicornWorker, self).init_process() File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process self.load_wsgi() File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi self.wsgi = self.app.wsgi() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load return self.load_wsgiapp() File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp return util.import_app(self.app_uri) File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app mod = importlib.import_module(module) File "/usr/local/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 843, in exec_module File "", line 219, in _call_with_frames_removed File "/app/app.py", line 34, in processing = Processing(det_name=settings.models.det_name, rec_name=settings.models.rec_name, File "/app/modules/processing.py", line 78, in init self.model = FaceAnalysis(det_name=det_name, File "/app/modules/face_model.py", line 72, in init self.det_model = Detector(det_name=det_name, max_size=self.max_size, No module named 'cupy' File "/app/modules/face_model.py", line 30, in init self.retina = get_model(det_name, backend_name=backend_name, force_fp16=force_fp16, im_size=max_size, File "/app/modules/model_zoo/getter.py", line 206, in get_model model_path = prepare_backend(model_name, backend_name, im_size=im_size, max_batch_size=max_batch_size, File "/app/modules/model_zoo/getter.py", line 125, in prepare_backend model = onnx.load(onnx_path) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 118, in load_model s = _load_bytes(f) File "/usr/local/lib/python3.8/site-packages/onnx/init.py", line 32, in _load_bytes with open(cast(Text, f), 'rb') as readable: FileNotFoundError: [Errno 2] No such file or directory: '/models/onnx/scrfd_2.5g_gnkps/scrfd_2.5g_gnkps.onnx'

SthPhoenix commented 2 years ago

It might be google drive is inaccessible in your region, you can try manually downloading models from google drive using proxy: srcfd_2.5g_gnkps glintr100

Models should be placed under following path: repo_root/models/onnx/{model_name}/{model_name}.onnx

jinzaz commented 2 years ago

@SthPhoenix If I download the model file first, which folder should I put in the directory before copying it to the Docker container

felixdollack commented 2 years ago

@SthPhoenix If I download the model file first, which folder should I put in the directory before copying it to the Docker container

From https://github.com/SthPhoenix/InsightFace-REST/issues/99#issuecomment-1297397758:

Models should be placed under following path:
repo_root/models/onnx/{model_name}/{model_name}.onnx

For example the glintr100 model should be in

repo_root/models/onnx/glintr100/glintr100.onnx
kuanyshbakytuly commented 6 months ago

It might be google drive is inaccessible in your region, you can try manually downloading models from google drive using proxy: srcfd_2.5g_gnkps glintr100

Models should be placed under following path: repo_root/models/onnx/{model_name}/{model_name}.onnx

I download scrfd_2.5g_gnkps.onnx and check md5sum, I get - a711d520006b358240836689b26ab4b4 But you said that it should be 50febd32caa699ef7a47cf7422c56bbd for scrfd_2.5g_gnkps.onnx