SthPhoenix / InsightFace-REST

InsightFace REST API for easy deployment of face recognition services with TensorRT in Docker.
Apache License 2.0
503 stars 117 forks source link

build successfully, but container didn't start #58

Open bltcn opened 3 years ago

bltcn commented 3 years ago

win11,wsl2,ubuntu18.04 微信图片_20211103184203 How can I deal with it?

SthPhoenix commented 3 years ago

Hi! I haven't tested image on windows. Have you checked container logs?

bltcn commented 3 years ago

Preparing models... [04:56:39] INFO - Preparing 'glintr100' model... [04:56:39] INFO - Building TRT engine for glintr100... [TensorRT] WARNING: onnx2trt_utils.cpp:362: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead [TensorRT] WARNING: GPU error during getBestTactic: Conv_0 : invalid argument [TensorRT] ERROR: 10: [optimizer.cpp::computeCosts::1855] Error Code 10: Internal Error (Could not find any implementation for node Conv_0.) Traceback (most recent call last): File "prepare_models.py", line 54, in prepare_models() File "prepare_models.py", line 49, in prepare_models prepare_backend(model_name=model, backend_name=backend_name, im_size=max_size, force_fp16=force_fp16, File "/app/modules/model_zoo/getter.py", line 157, in prepare_backend convert_onnx(temp_onnx_model, File "/app/modules/converters/onnx_to_trt.py", line 84, in convert_onnx assert not isinstance(engine, type(None)) AssertionError Starting InsightFace-REST using 1 workers. [04:56:51] INFO - 1 [04:56:51] INFO - MAX_BATCH_SIZE: 1 [04:56:51] INFO - Reshaping ONNX inputs to: (1, 3, 640, 640) [04:56:51] INFO - In shape: [dim_value: 1 , dim_value: 3 , dim_param: "?" , dim_param: "?" ] [04:56:51] INFO - Building TRT engine for scrfd_10g_gnkps... [TensorRT] WARNING: onnx2trt_utils.cpp:362: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead [TensorRT] WARNING: GPU error during getBestTactic: Conv_0 + Relu_1 : invalid argument [TensorRT] ERROR: 10: [optimizer.cpp::computeCosts::1855] Error Code 10: Internal Error (Could not find any implementation for node Conv_0 + Relu_1.) Traceback (most recent call last): File "/usr/local/bin/uvicorn", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1128, in call return self.main(args, kwargs) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 754, in invoke return __callback(args, kwargs) File "/usr/local/lib/python3.8/dist-packages/uvicorn/main.py", line 425, in main run(app, kwargs) File "/usr/local/lib/python3.8/dist-packages/uvicorn/main.py", line 447, in run server.run() File "/usr/local/lib/python3.8/dist-packages/uvicorn/server.py", line 68, in run return asyncio.run(self.serve(sockets=sockets)) File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete return future.result() File "/usr/local/lib/python3.8/dist-packages/uvicorn/server.py", line 76, in serve config.load() File "/usr/local/lib/python3.8/dist-packages/uvicorn/config.py", line 448, in load self.loaded_app = import_from_string(self.app) File "/usr/local/lib/python3.8/dist-packages/uvicorn/importer.py", line 21, in import_from_string module = importlib.import_module(module_str) File "/usr/lib/python3.8/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 671, in _load_unlocked File "", line 848, in exec_module File "", line 219, in _call_with_frames_removed File "/app/./app.py", line 36, in processing = Processing(det_name=configs.models.det_name, rec_name=configs.models.rec_name, File "/app/./modules/processing.py", line 180, in init self.model = FaceAnalysis(det_name=det_name, rec_name=rec_name, ga_name=ga_name, device=device, File "/app/./modules/face_model.py", line 78, in init self.det_model = Detector(det_name=det_name, device=device, max_size=self.max_size, File "/app/./modules/face_model.py", line 37, in init self.retina = get_model(det_name, backend_name=backend_name, force_fp16=force_fp16, im_size=max_size, File "/app/./modules/model_zoo/getter.py", line 203, in get_model model_path = prepare_backend(model_name, backend_name, im_size=im_size, max_batch_size=max_batch_size, File "/app/./modules/model_zoo/getter.py", line 157, in prepare_backend convert_onnx(temp_onnx_model, File "/app/./modules/converters/onnx_to_trt.py", line 84, in convert_onnx assert not isinstance(engine, type(None)) AssertionError

SthPhoenix commented 3 years ago

Have you tried running other GPU based containers on wsl2, like TensorFlow benchmarks, to verify your wsl2 is properly configured for GPU usage?

SthPhoenix commented 3 years ago

Try running this sample: https://docs.nvidia.com/cuda/wsl-user-guide/index.html#ch05-sub01-simple-containers

bltcn commented 3 years ago

Run "nbody -benchmark [-numbodies=]" to measure performance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies= (number of bodies (>= 1) to run in simulation) -device= (where d=0,1,2.... for the CUDA device to use) -numdevices= (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy= (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Windowed mode Simulation data stored in video memory Single precision floating point simulation 1 Devices used for simulation GPU Device 0: "Pascal" with compute capability 6.1

Compute 6.1 CUDA device: [NVIDIA GeForce GTX 1060] 10240 bodies, total time for 10 iterations: 8.868 ms = 118.245 billion interactions per second = 2364.896 single-precision GFLOP/s at 20 flops per interaction

SthPhoenix commented 3 years ago

Hm, then TensorRT should work as expected.

I can double check that latest published version of InsightFace-REST works out of the box, but unfortunately I can't help you with running it on Windows.

SthPhoenix commented 3 years ago

I have checked building from scratch with clean clone from repo - everything works as intended on Ubuntu 20.04.

Looks like it's WSL related problem.

bltcn commented 3 years ago

thanks, I have tested cpu version. it works fine.maybe there is somthing wrong with parameters in this case

SthPhoenix commented 3 years ago

Quote from Nvidia page above:

With the NVIDIA Container Toolkit for Docker 19.03, only --gpus all is supported.

This might be the case, since deploy_trt.sh tries to set specific GPU. Try replacing line 99 with --gpus all

Though according to the same document there also might be issues with pinned memory required for TensorRT, and issues with concurrent CUDA streams.

If pinned memory is also the issue you can try add RUN $PIP_INSTALL onnxruntime-gpu to Dockerfile_trt and switch inference backend to onnx in deploy_trt.sh at line 105

bltcn commented 3 years ago

thanks. i will try

SthPhoenix commented 3 years ago

Hi! Any updates? Have you managed to run it under WSL2?

bltcn commented 2 years ago

sorry,I just see your reply. I will try.

SthPhoenix commented 2 years ago

sorry,I just see your reply. I will try.

Looks like WSL2 just wasn't supported by TensorRT, but according to change log latest TensorRT version should support it. Try using 21.12 TensorRT image.

talebolano commented 2 years ago

sorry,I just see your reply. I will try.

Looks like WSL2 just wasn't supported by TensorRT, but according to change log latest TensorRT version should support it. Try using 21.12 TensorRT image.

i try 21.12 and 22.01TensorRT image, unfortunately,all failed. 21.12 report GPU error during getBestTactic, 22.01 report Cuda failure: integrity checks failed

SthPhoenix commented 2 years ago

i try 21.12 and 22.01TensorRT image, unfortunately,all failed. 21.12 report GPU error during getBestTactic, 22.01 report Cuda failure: integrity checks failed

Have you tried running other GPU based containers on WSL2 to ensure everything is installed correctly?