broeslie commented 2 years ago

Describe the bug When I had a few persons ( subjects) stored in Compreface, the facial recognition worked just fine. Now there are approximately 8K persons added and the face recognition simply returns: [message] => Something went wrong, please try again [code] => 0

This goes for both the API and the admin interface

Other API's still work fine by the way. I can fetch the subjects and the images.

To Reproduce Steps to reproduce the behavior:

create approx 8K subjects, having approx 166K photos.
try the face recognition API and then the message appears.

Expected behavior So maybe this system is not designed for this much subjects and images, but I sure would like know where the limit is and which approach would be best in my usecase.

Desktop (please complete the following information): lscpu | grep avx Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize arch_lbr flush_l1d arch_capabilities

lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.1 LTS Release: 22.04 Codename: jammy

nvidia-smi

Thu Sep  1 20:40:52 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| 30%   30C    P8     8W / 184W |     14MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1251      G   /usr/lib/xorg/Xorg                  9MiB |
|    0   N/A  N/A      1410      G   /usr/bin/gnome-shell                3MiB |
+-----------------------------------------------------------------------------+

broeslie commented 2 years ago

Update: So I managed to find a cause: from the logs I could see that there a heap-space error. I increased the memory in the .env file and now that specific error is gone. I also updated to version 1.0.1 (downloading the zip file)

So but now, when I post to the facerecognition server, I can see CPU is going up, but it, unfortunately, does not give an answer at all. According to my Curl request: 504 Gateway Time-out. The same goes for the Amin interface.

Any pointers? tnx

pospielov commented 2 years ago

What CompreFace build do you use? From your screenshot, it looks like it's not a GPU-supported build.

broeslie commented 2 years ago

Hi @pospielov , Thanks for your quick response. I installed version 1.0.1 from this page: https://github.com/exadel-inc/CompreFace/releases Then change memory : compreface_api_java_options=-Xmx10g in the .env file And then simply ran docker-composer up -d

I have not used any of the custom builds in the custom-builds directory. Should I?

pospielov commented 2 years ago

If you want to use GPU to speed up the recognition, it's better to use one of the custom builds that support GPU: https://github.com/exadel-inc/CompreFace/blob/master/docs/Custom-builds.md

Could you show the result of the command: docker logs compreface-core?

broeslie commented 2 years ago

Here are the two log lines generated when it timesout: {"severity": "DEBUG", "message": "Found: BoundingBoxDTO(x_min=133, y_min=80, x_max=334, y_max=311, probability=0.9998749494552612, _np_landmarks=array([[208, 178],\n [275, 186],\n [241, 222],\n [197, 240],\n [264, 249]]))", "request": {"method": "POST", "path": "/find_faces", "filename": "nicole.jpg", "api_key": "", "remote_addr": "172.25.0.4"}, "logger": "src.services.facescan.plugins.facenet.facenet", "module": "facenet", "traceback": null, "build_version": "dev"} {"severity": "INFO", "message": "200nis OK", "request": {"method": "POST", "path": "/find_faces", "filename": "nicole.jpg", "api_key": "", "remote_addr": "172.25.0.4"}, "logger": "src.services.flask_.log_response", "module": "log_response", "traceback": null, "build_version": "dev"}

504 Gateway Time-out

nginx

I will also follow your advise on the custom build. tnx.

broeslie commented 2 years ago

Unfortunately, I was not able run Mobilenet-gpu and SubCenter-ArcFace-r100-gpu ERROR: for compreface-core Cannot create container for service compreface-core: Unknown runtime specified nvidia

broeslie commented 2 years ago

Apparently I was missing nvidia run time which I installed:

` curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

Save that file and refresh the package list $ sudo apt-get update

Install nvidia-docker2 and reload the Docker configurations sudo apt-get install -y nvidia-docker2

sudo apt install -y nvidia-docker2 sudo systemctl daemon-reload sudo systemctl restart docker `

And now the GPU build version Subcenter also starts . So time for the next step. And thank you for putting up with these noob questions :)

broeslie commented 2 years ago

I got stuck here: when posting a photo of a subject in SubCenter-ArcFace-r100-gpu , I now receive this message:

{ "message":"Error during synchronization between servers: [500 INTERNAL SERVER ERROR] during [POST] to [http:\/\/compreface-core:3000\/find_faces] [FacesFeignClient#findFaces(MultipartFile,Integer,Double,String)]: [{\"message\":\"RuntimeError: simple_bind error. Arguments:\\ndata: (1, 3, 480, 640)\\nTraceback (most recent call last):\\n File \\\"..\/src\/storage\/storage.cc\\\", line 97\\nCUDA: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: no CUDA-capable device is detected\"}\n]", "code":41 }

CUDA: NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7

Any pointers? Thanks in advance :)

pospielov commented 2 years ago

Sorry for a long response Could you run docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi just to make sure you set up docker correctly? Also, what GPU do you have?

broeslie commented 2 years ago

Hi,

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+ `

pospielov commented 2 years ago

That is strange, I discussed it with our dev, and we don't have any idea why it can be like this. What GPU do you use?

exadel-inc / CompreFace

"Something went wrong, please try again" message #884

504 Gateway Time-out