Open broeslie opened 2 years ago
Update: So I managed to find a cause: from the logs I could see that there a heap-space error. I increased the memory in the .env file and now that specific error is gone. I also updated to version 1.0.1 (downloading the zip file)
So but now, when I post to the facerecognition server, I can see CPU is going up, but it, unfortunately, does not give an answer at all. According to my Curl request: 504 Gateway Time-out. The same goes for the Amin interface.
Any pointers? tnx
What CompreFace build do you use? From your screenshot, it looks like it's not a GPU-supported build.
Hi @pospielov , Thanks for your quick response. I installed version 1.0.1 from this page: https://github.com/exadel-inc/CompreFace/releases Then change memory : compreface_api_java_options=-Xmx10g in the .env file And then simply ran docker-composer up -d
I have not used any of the custom builds in the custom-builds directory. Should I?
If you want to use GPU to speed up the recognition, it's better to use one of the custom builds that support GPU: https://github.com/exadel-inc/CompreFace/blob/master/docs/Custom-builds.md
Could you show the result of the command: docker logs compreface-core
?
Here are the two log lines generated when it timesout:
{"severity": "DEBUG", "message": "Found: BoundingBoxDTO(x_min=133, y_min=80, x_max=334, y_max=311, probability=0.9998749494552612, _np_landmarks=array([[208, 178],\n [275, 186],\n [241, 222],\n [197, 240],\n [264, 249]]))", "request": {"method": "POST", "path": "/find_faces", "filename": "nicole.jpg", "api_key": "", "remote_addr": "172.25.0.4"}, "logger": "src.services.facescan.plugins.facenet.facenet", "module": "facenet", "traceback": null, "build_version": "dev"} {"severity": "INFO", "message": "200nis OK", "request": {"method": "POST", "path": "/find_faces", "filename": "nicole.jpg", "api_key": "", "remote_addr": "172.25.0.4"}, "logger": "src.services.flask_.log_response", "module": "log_response", "traceback": null, "build_version": "dev"}
I will also follow your advise on the custom build. tnx.
Unfortunately, I was not able run Mobilenet-gpu and SubCenter-ArcFace-r100-gpu ERROR: for compreface-core Cannot create container for service compreface-core: Unknown runtime specified nvidia
Apparently I was missing nvidia run time which I installed:
` curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
Save that file and refresh the package list $ sudo apt-get update
Install nvidia-docker2 and reload the Docker configurations sudo apt-get install -y nvidia-docker2
sudo apt install -y nvidia-docker2 sudo systemctl daemon-reload sudo systemctl restart docker `
And now the GPU build version Subcenter also starts . So time for the next step. And thank you for putting up with these noob questions :)
I got stuck here: when posting a photo of a subject in SubCenter-ArcFace-r100-gpu , I now receive this message:
{ "message":"Error during synchronization between servers: [500 INTERNAL SERVER ERROR] during [POST] to [http:\/\/compreface-core:3000\/find_faces] [FacesFeignClient#findFaces(MultipartFile,Integer,Double,String)]: [{\"message\":\"RuntimeError: simple_bind error. Arguments:\\ndata: (1, 3, 480, 640)\\nTraceback (most recent call last):\\n File \\\"..\/src\/storage\/storage.cc\\\", line 97\\nCUDA: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: no CUDA-capable device is detected\"}\n]", "code":41 }
CUDA: NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7
Any pointers? Thanks in advance :)
Sorry for a long response
Could you run
docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
just to make sure you set up docker correctly?
Also, what GPU do you have?
Hi,
Here is the outcome:
`docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
Fri Sep 23 10:23:16 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 30% 27C P8 7W / 184W | 14MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+ `
That is strange, I discussed it with our dev, and we don't have any idea why it can be like this. What GPU do you use?
Describe the bug When I had a few persons ( subjects) stored in Compreface, the facial recognition worked just fine. Now there are approximately 8K persons added and the face recognition simply returns:
[message] => Something went wrong, please try again [code] => 0
This goes for both the API and the admin interface
Other API's still work fine by the way. I can fetch the subjects and the images.
To Reproduce Steps to reproduce the behavior:
Expected behavior So maybe this system is not designed for this much subjects and images, but I sure would like know where the limit is and which approach would be best in my usecase.
Desktop (please complete the following information): lscpu | grep avx Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize arch_lbr flush_l1d arch_capabilities
lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.1 LTS Release: 22.04 Codename: jammy
nvidia-smi