cosanlab / py-feat

Facial Expression Analysis Toolbox
https://py-feat.org/
Other
260 stars 71 forks source link

Possible to run the detector on real time video? #168

Open Sanjeev-Monash opened 1 year ago

Sanjeev-Monash commented 1 year ago

Hi there,

I was just wondering if its possible to run just facial action unit detection on a live webcam feed with low latency?

Currently I am running detector.detect_image() on each image captured from opencv's VideoCapture() function and it is causing extremely slow FPS when I am running it in real-time. I am using an Nvidia RTX A1000 and device="cuda" is set in the detector constructor.

The FPS I am getting is extremely low ~0.2fps. I do not need to run emotion detection, just action unit detection but as a precursor for that I assume I cant avoid detecting faces and facial landmarks as its a requirement for the action unit detection.

Any ideas how I can run this in real-time?

Thanks, really appreciate any help on this.

ljchang commented 1 year ago

hi @Sanjeev-Monash, thanks for your question. We haven't really explored this possibility yet, and most of our development efforts have focused on usability and stability over optimizing for performance. We definitely agree that the detect image is likely to be slow for this type of application. It's possible you could slightly speed up detection by modifying the detect_image() function to only use the specific detectors you need. For example, AU detection only needs face bounding box and landmarks as input. I think you could reduce this down to just three detectors (face bounding box, landmark, and AU) compared to the 5 in the current version. The AU detection unfortunately is still slow and is one of the few models that won't see a speedup with GPUs as it is not a pytorch model.

GuiCamargoX commented 1 year ago

hi @ljchang ! Could please give an example of how to set the specific detectors in detect_image. In my application, only the action units are important. Thanks for your help!

ljchang commented 1 year ago

hi @GuiCamargoX , each individual detector requires different inputs. For example, the AU detector requires landmark information, and some of the landmark detectors require face bounding boxes.

Here is a quick example of using the individual detectors for AU detection:

# load modules
import os
from torchvision.io import read_image
from feat.detector import Detector

# load image
single_face_img_path = os.path.join(get_test_data_path(), "single_face.jpg")
img = read_image(single_face_img_path)

# initialize detector 
detector_cpu = Detector(verbose=False, device='cpu')

# run individual detectors
faces = detector_cpu.detect_faces(img)
landmarks = detector_cpu.detect_landmarks(img, detected_faces=faces)
aus = detector_cpu.detect_aus(img, landmarks)

Note that the output is a list of lists not the standard Fex objects. The list of lists is frames and faces detected within each frame.