WORKER TIMEOUT, Booting worker

rajil commented 4 years ago

I am trying to use an mlapi server for face detection. Objection detection works fine, but face detection fails.

$ gunicorn mlapi:app 
[2020-06-13 17:52:02 +0000] [1048] [INFO] Starting gunicorn 20.0.4
[2020-06-13 17:52:02 +0000] [1048] [INFO] Listening at: http://0.0.0.0:5001 (1048)
[2020-06-13 17:52:02 +0000] [1048] [INFO] Using worker: sync
[2020-06-13 17:52:02 +0000] [1050] [INFO] Booting worker with pid: 1050
[2020-06-13 17:52:02 +0000] [1051] [INFO] Booting worker with pid: 1051
Initializing log
Initializing log
DEBUG: secret filename: ./secrets.ini
DEBUG: Secret token found in config: !MLAPI_SECRET_KEY
INFO: --------| mlapi version: 1.0.7 |--------
DEBUG: Opening DB at ./db/db.json
DEBUG: DB engine ready
DEBUG: secret filename: ./secrets.ini
DEBUG: Secret token found in config: !MLAPI_SECRET_KEY
INFO: --------| mlapi version: 1.0.7 |--------
DEBUG: Opening DB at ./db/db.json
DEBUG: DB engine ready
DEBUG: Initializing face recognition with model:cnn upsample:1, jitters:0
DEBUG: Initializing face recognition with model:cnn upsample:1, jitters:0
DEBUG: trained file not found, reading from images and doing training...
DEBUG: trained file not found, reading from images and doing training...
ERROR: No known faces found to train, encoding file not created
ERROR: No known faces found to train, encoding file not created
ERROR: Error loading face recognition file: [Errno 2] No such file or directory: './known_faces/faces.dat'
ERROR: Error loading face recognition file: [Errno 2] No such file or directory: './known_faces/faces.dat'
DEBUG: Object Recognition requested
DEBUG: get_file returned: ./images/659d7566-c10f-44df-88e1-aa8ff83ffde0.jpg
DEBUG: Initializing Yolo
DEBUG: config:./models/yolov3/yolov3.cfg, weights:./models/yolov3/yolov3.weights
DEBUG: YOLO initialization (loading model from disk) took: 537.807 milliseconds
DEBUG: Setting CUDA backend for OpenCV. If you did not set your CUDA_ARCH_BIN correctly during OpenCV compilation, you will get errors during detection related to invalid device/make_policy
DEBUG: YOLO detection took: 969.331 milliseconds
DEBUG: YOLO NMS filtering took: 0.979 milliseconds
INFO: object:person at [744, 265, 934, 601] has a acceptable confidence:0.9830182194709778 compared to min confidence of: 0.4, adding
INFO: object:car at [954, 46, 1182, 138] has a acceptable confidence:0.9492545127868652 compared to min confidence of: 0.4, adding
INFO: object:bicycle at [948, 227, 1048, 323] has a acceptable confidence:0.8091424107551575 compared to min confidence of: 0.4, adding
INFO: object:pottedplant at [9, 139, 49, 209] has a acceptable confidence:0.5524479746818542 compared to min confidence of: 0.4, adding
INFO: object:pottedplant at [34, 128, 98, 222] has a acceptable confidence:0.5505369901657104 compared to min confidence of: 0.4, adding
INFO: object:pottedplant at [79, 182, 121, 246] has a acceptable confidence:0.5200017690658569 compared to min confidence of: 0.4, adding
INFO: rejecting object:pottedplant at [2, 99, 24, 227] because its confidence is :0.3217852711677551 compared to min confidence of: 0.4
INFO: rejecting object:car at [38, 557, 562, 671] because its confidence is :0.31937599182128906 compared to min confidence of: 0.4
INFO: rejecting object:pottedplant at [74, 136, 132, 242] because its confidence is :0.27401870489120483 compared to min confidence of: 0.4
INFO: rejecting object:pottedplant at [318, 116, 372, 224] because its confidence is :0.22889778017997742 compared to min confidence of: 0.4
DEBUG: Face Recognition requested
DEBUG: get_file returned: ./images/564d23eb-b85d-4188-8c29-95f9c22e9b00.jpg
DEBUG: |---------- Face recognition (input image: 1200w*675h) ----------|
[2020-06-13 17:52:47 +0000] [1048] [CRITICAL] WORKER TIMEOUT (pid:1051)
[2020-06-13 17:52:49 +0000] [1238] [INFO] Booting worker with pid: 1238
Initializing log
DEBUG: secret filename: ./secrets.ini
DEBUG: Secret token found in config: !MLAPI_SECRET_KEY
INFO: --------| mlapi version: 1.0.7 |--------
DEBUG: Opening DB at ./db/db.json
DEBUG: DB engine ready
DEBUG: Initializing face recognition with model:cnn upsample:1, jitters:0
DEBUG: trained file not found, reading from images and doing training...
ERROR: No known faces found to train, encoding file not created
ERROR: Error loading face recognition file: [Errno 2] No such file or directory: './known_faces/faces.dat'

rajil commented 4 years ago

The object detection is using CUDA, but the face detection is not. This is probably the reason it gets booted out. Is CUDA not supported for face recognition?

Setting 'timeout=120' in gunicorn.conf.py fetched more time to finish query.

My config is like so,

[ml]
# Starting version 4.2 of OpenCV, the DNN models support CUDA
# If you have compiled OpenCV 4.2 with CUDA support correctly
# set this to yes. Note that if you have just installed a package
# chances are it is not properly set up with CUDA. It is much better
# you compile OpenCV from source (and uninstall any opencv packages you
# installed via pip or apt-get)
# Read https://www.pyimagesearch.com/2020/02/03/how-to-use-opencvs-dnn-module-with-nvidia-gpus-cuda-and-cudnn/ on how to do it right.
# Play special attention to putting in the right CUDA_ARCH_BIN value that
# matches your GPU or you'll face "invalid device errors in make_policy"
# while trying to actually run it (compile will work fine)

use_opencv_dnn_cuda=yes

[yolo]
yolo_type=full
config=./models/yolov3/yolov3.cfg
weights=./models/yolov3/yolov3.weights
labels=./models/yolov3/yolov3.labels
tiny_config=./models/tinyyolo/yolov3-tiny.cfg
tiny_weights=./models/tinyyolo/yolov3-tiny.weights
tiny_labels=./models/tinyyolo/yolov3-tiny.labels

[face]
face_num_jitters=0
face_upsample_times=1
face_model=cnn
face_train_model=hog
face_recog_dist_threshold=0.6
face_recog_knn_algo=ball_tree

known_faces_path=./known_faces
unknown_faces_path=./unknown_faces

unknown_face_name=unknown face
save_unknown_faces=yes
save_unknown_faces_leeway_pixels=50

pliablepixels commented 4 years ago

A few things:

If you are installing the ES+hook using install.sh:

a) Face detection does use GPU, but make sure you have followed the hooks instructions on how to compile Dlib with GPU support

b) Looking at your logs, looks like you haven't configured face recognition properly as per the instructions. Specifically, you don't have any trained faces. You may want to first go through the training process

If you are using mlapi using @themoosman 's containerized fork (https://github.com/themoosman/mlapi), then you might want to post on his fork on questions specific to setting up face recognition in his container and/or Gunicorn settings. I haven't had a chance to look at what requirements he has on setting things up.

themoosman commented 4 years ago

It looks like it can't find the face recognition files. Are they present?

DEBUG: Initializing face recognition with model:cnn upsample:1, jitters:0
DEBUG: trained file not found, reading from images and doing training...
ERROR: No known faces found to train, encoding file not created
ERROR: Error loading face recognition file: [Errno 2] No such file or directory: './known_faces/faces.dat'

pliablepixels commented 4 years ago

That error means the OP hasn't trained any faces. I don't remember what mlapi does when there are no faces to recognize because none were trained. If it is doing nothing, that may also explain the timeout. I fixed this in the ES and don't remember if it was also fixed in mlapi. Another problem to look at and why I want both the local hook and mlapi to same the same libs.

rajil commented 4 years ago

I copied the known_faces directory to the mlapi server from ES server and the message went away. Also, I was able to get dlib going with CUDA.

ZoneMinder / mlapi

WORKER TIMEOUT, Booting worker #12