de-code / python-tf-bodypix

A Python implementation of the bodypix model.
MIT License
124 stars 20 forks source link

auto-track sample implementation #82

Open benbatya opened 3 years ago

benbatya commented 3 years ago

@de-code I got inspired to split auto-track into a separate mode and just add it to this project. I like the effect a lot with my 120deg FOV Spedal camera because I can move around the room and it (mostly) tracks my face which is very convenient for long boring meetings. I know that we talked about adding it to layered-vision but it just was easiest to add the additional mode in this project.

Enjoy! Ben

auto_track.patch.txt

de-code commented 3 years ago

Thank you for that.

I am still not quite sure about adding it to this project. I can definitely see it being a nice feature (although I am just using the laptop built in webcam with an already quite bad resolution). It just seems slightly out of scope and might open it up to future feature expansion. I wouldn't want potential users of this library deter from the it being too large. (But maybe the cli should be its own project)

If you are not using the bodypix mask otherwise, wouldn't one of the CV face detection options be faster?

I will make a few more changes to layered-vision to make it easier. But there is nothing wrong with just having a separate script using tf-bodypix as a library to implement it the way you want. (that was one of the purposes of this project to make that easier). Perhaps we could also have an "extras" project with example code, including yours.

I will definitely refer back to your code example.

benbatya commented 3 years ago

Sorry, I just found it easiest to add it to python_tf_bodypix. Not professional at all... It does work nicely with the 120deg FOV camera https://www.amazon.com/Spedal-Conference-Streaming-Microphone-Desktop/dp/B07TDQ8NL3

It works but scaling up the resolution (1920x1080 vs 640x480) becomes a performance problem. I'm going to experiment with running the detector on a scaled down version and then crop and scale the hi-rez version. Running the model on the GPU would help but that's not possible now. Anyway, it's fun to play with.

I wanted to combine it with the background subtraction that bodypix provides and to be able to select which body parts are cropped. And the model is pretty fast on the CPU.

I'll fork the repo and maintain my own version for now. But you are right that making it a library with separate apps would be a better design.

de-code commented 3 years ago

What fps do you get? Or what timings? (it should show timings of individual parts every second)

I find that actually calculating individual part masks can take up a significant additional amount. And also combining images. I am sure there must be faster ways even without a GPU. But as it's now, that may be a bit slow when scaling up.

For auto-track to work nicely, it will probably be good to interpolate movements and zoom in and out depending on the size of the detected face.

My default resolution is 848x480. With that I already find it not looking very sharp. Doesn't it start to look like a bit retro when you significantly zoom in on your face with a similar resolution?

Running the model on the GPU would help but that's not possible now

Do you have a GPU that TensorFlow can't use?

benbatya commented 3 years ago

On this branch (https://github.com/benbatya/python-tf-bodypix/tree/autoframe_feature) I'm getting about 13fps.

I scale down a high resolution to width=640 and maintain aspect ratio, run bodypix on the smaller image and then scale the bounding boxes. So it maintains the fps okay. The multiple rescaling is a slight perf hit but not as bad as running bodypix on the high resolution. The key is that even if the final resolution is 1/3 of the original high res images, transmitting it into a zoom meeting means that there deceased resolution is not noticable. In big meetings, zoom will drop the resolution of each incoming stream down to 160x90 px at 7fps. It's very good at managing limited bandwidth.

I think that your suggestion about interpolating the frames is a good one. I just use a moving window mean to smooth out the camera view motion. Another fix is to handle when there's no detectable body in the frame so the view doesn't revert to default.

I have a P1000 on my laptop which I'd like to use but don't know how to run tflite models on it. I think that there's a way to configure the device selection but don't know how to do it. Also, it may not be worth it. The overhead DMA transfer times can be significant so if the model runs fast enough on the cpu, transferring it to the gpu may actually be a slow down