Deinhofer/javacv upgrade and optimizations

deinhofer commented 4 years ago

Fixes #331

This PR adds support for the Macbook webcam and improves tracking quality.

Uses javacv 1.5.2 (opencv 4.1.2)
In many cases (face detection, calcOpticalFlow) the new C++ API is used.
On linux platforms cpu load drops a lot (~1/2 of CPU time).
Tracking on RPi works much smoother
Improved face detection: Resizing image to height 75 before face detection increases detection speed a lot.
Improved build process:
- added libs of maven resolver ant task and bnd tools
- Dependencies can now be directly loaded using a pom file (e.g. the javacv pom file) using maven resolver ant task.
- Correct OSGI-fied jars can easily be built using bnd tools by the usage of a .bnd file.
Bundle-size: Unfortunately opencv depends on openblas now which increases the bundle size of the javacv-osgi-fied jars.

@klues @ChrisVeigl @benjaminaigner @sabicalija @bmedicke Please test on your platforms in the next days.

git fetch
git checkout deinhofer/javacv-upgrade-and-optimizations
ant clean
ant run

benjaminaigner commented 4 years ago

Tested both versions on:

Debian 10 (Testing / Bulseye)
Pentium G3320 with 4GB RAM (freshly restarted after build)

Performance: master: 20/23% CPU load 1,7GB RAM (total system) 13-23FPS this branch: 10/15% CPU load 1,6GB RAM (total system) 15-21FPS

deinhofer commented 4 years ago

Thanks to @benjaminaigner , looks good. @bmedicke successfully tested it on Mac Os macOS 10.15.4 with a newer Macbook

@ChrisVeigl and @klues please test on Windows and post the results here. With Java 32bit and 64bit, if possible. I would like to merge tomorrow.

deinhofer commented 4 years ago

sorry for the repetitions, github showed an error message indicating that the post had failed. :-)

klues commented 4 years ago

Tested both versions on: Win 10 / Core i7-7500U CPU

Performance: master: 7-10% CPU load 90-150MB RAM (Java process) 31 FPS

this branch: 7-10% CPU load 170-240MB RAM (Java process) 31 FPS

So the new version needs about 80-90MB more RAM, all other things are working fine 👍

ChrisVeigl commented 4 years ago

here my test results:

System information: Win10 (x64), Lenovo Thinkpad P51s (core i7-7500, 2.7GHz), Java32bit

Tested model: "XCameraMouse" (startup menu)

Results (this branch): 31 FPS (internal Webcam, device 0) CPU-load: ~5% (no face detected); CPU-load: ~8% (during tracking)
RAM: ~260 MB

Results (AsTeRICS release version 4.0.0) 31 FPS (internal Webcam, device 0) CPU-load: ~14% (no face detected); CPU-load: ~8% (during tracking) -> interestingly, no noticable difference here!
RAM: ~100 MB

Regarding the RAM usage: the system consumes about 58 MB when started (model autostart running, displays the main menu), but when the xCameraMouse model is loaded the RAM usage rises constantly (during the operation of the camera mouse, about 5MB per second) to around 280MB. The garbage collection releases 30MB of RAM occasionally. Something very "leaky" seems to go on here.... After the model is stopped, the RAM usage drops to about 180MB and stays there (100 MB more than before).

Further it should be noted, that the "old" CameraMouse model on Windows consumes only about 2% CPU during tracking, and RAM usage stay around 80MB (constant).

deinhofer commented 4 years ago

here my test results:

System information: Win10 (x64), Lenovo Thinkpad P51s (core i7-7500, 2.7GHz), Java32bit

Tested model: "XCameraMouse" (startup menu)

Results (this branch): 31 FPS (internal Webcam, device 0) CPU-load: ~5% (no face detected); CPU-load: ~8% (during tracking) RAM: ~260 MB

Results (AsTeRICS release version 4.0.0) 31 FPS (internal Webcam, device 0) CPU-load: ~14% (no face detected);

The detection load decreases as the used images size is smaller.

CPU-load: ~8% (during tracking) -> interestingly, no noticable difference here! RAM: ~100 MB

Regarding the RAM usage: the system consumes about 58 MB when started (model autostart running, displays the main menu), but when the xCameraMouse model is loaded the RAM usage rises constantly (during the operation of the camera mouse, about 5MB per second) to around 280MB. The garbage collection releases 30MB of RAM occasionally. Something very "leaky" seems to go on here....

If the rise is just temporarily and does not go up infinitely, it's fine. Does the rise stop at some point? It should be a sawtooth curve.

After the model is stopped, the RAM usage drops to about 180MB and stays there (100 MB more than before).

Further it should be noted, that the "old" CameraMouse model on Windows consumes only about 2% CPU during tracking, and RAM usage stay around 80MB (constant).

Yes, of course because it's implemented natively. But it does not run on other platforms.

ChrisVeigl commented 4 years ago

If the rise is just temporarily and does not go up infinitely, it's fine. Does the rise stop at some point? It should be a sawtooth curve.

yes - RAM consumption rises quickly and the garbage collector kicks in occasionally. gives a sawtooth curve with quite high amplitude ;-)

ChrisVeigl commented 4 years ago

I just tested the performance of the facetracking on a RaspberryPi4 (4GB RAM, Raspbian buster). I used Oracle Java 8 JDK/JRE (v1.8.0_251), the binaries for Linux ARM 32 were downloaded and installed from Oracle (according to this post) Building the ARE via ant worked without problems (build time for master branch ~3min, for this branch: ~22min).

A standard raspi cam (2.1) was used as camera. Face detection worked well with 30 FPS. CPU usage was about 15% avarage on all 4 cores if no faces are detected, and about 35% during tracking. (Main difference between this version and the master branch is that the master branch consumes more CPU when no faces are detected: 25%)

The tracking works really well - even the chrome browser can be used without significant problems via the camera mouse :-)

However, I also investigated the memory utilisation on the RaspberryPi: After starting the ARE, it uses about 120MB of RAM when the autostart model (menu) is running (similar to the x86 version). After the "XFacetracker" model was started, RAM consumption went up steadily (about 5MB per second) until about 1,6GB of RAM were occupied. At this time, about 80 java threads existed (see screenshot of htop)

ScreenHunter 250

When the model was closed (and the menu was displayed again), the RAM was not released. After about 15 minutes, the garbage collector suddenly released about 500MB. I am not sure if the garbage collector would release unused RAM sooner (or more often if less RAM is available, eg. in the 1GB or 2GB version of the Raspi4) I think it would make sense to investige this issue further (it seems to exist also in the master branch / previous releases) - maybe the garbage collection can be triggered "manually".

ChrisVeigl commented 4 years ago

As the memory consumption problem is unrelated to this pull request, I created a seperate issue for that.

I have no obligations against merging this pull request, although 2 suggestions:

it is clear that reduzing the size of the camera frame decreases computation time tremendously - but it also decreases the accuracy of the face detection. i my tests i got the impression that the accuracy is still sufficient for the mouse cursor control task. however, there might be scenarious where this (now hard-coded) resolution is too low. A solution would be to provide the scaling factor / image-height as a user-selectable property
the new dependencies add a lot of overhead (e.g. the opencv binaries for iOS, Android etc.) - and the build time increased a lot. If there are ways to exclude unnecessary componentes that would be great

ChrisVeigl commented 4 years ago

thanks for having a look at the memory consumption problem.

i tested this on RaspberryPi4 - it somehow improves the situation, but still hundreds of MB are consumed rather quickly. I suggest another solution here https://github.com/asterics/AsTeRICS/issues/334

deinhofer commented 4 years ago

As the memory consumption problem is unrelated to this pull request, I created a seperate issue for that.

I have no obligations against merging this pull request, although 2 suggestions:

* it is clear that reduzing the size of the camera frame decreases computation time tremendously - but it also decreases the accuracy of the face detection. i my tests i got the impression that the accuracy is still sufficient for the mouse cursor control task. however, there might be scenarious where this (now hard-coded) resolution is too low. A solution would be to provide the scaling factor / image-height as a user-selectable property

The resolution is only used for face detection (haarcascade) and I did not have any detection problems so far. For optical flow the full resolution is used.

* the new dependencies add a lot of overhead (e.g. the opencv binaries for iOS, Android etc.) - and the build time increased a lot. If there are ways to exclude unnecessary componentes that would be great

I actually don't know how to improve it now. But it's only the first time downloading the takes more time. After that the files are cached by maven. Also the generated osgi jar is cached as soon as it exists.

ChrisVeigl commented 4 years ago

in my tests, the initialisation of the face position (when the tracking points have not been set, or got lost) proved to be difficult. it is necessary to present the face in a perfectly aligned vertical orientaton to get a successful haar cascade classification. This could be a challenge (or even unusable) for the target audience. (the face detection works better in the original FaceTrackerLK algorithm, maybe this is related to the resolution).

asterics / AsTeRICS

Deinhofer/javacv upgrade and optimizations #332