cansik / deep-vision-processing

Deep computer-vision algorithms for the Processing framework.
91 stars 22 forks source link

Congrats, Installation and FPS #2

Closed il-easteregg closed 3 years ago

il-easteregg commented 4 years ago

Hey @cansik,

first of all: Congratulations and big thanks for your hard work. Stumbled upon your git when reading your comment in Linzaers git.

Since I have no clue about Java, is there an easy way to install and/or are you planning to include an installation guide?

The 40fps on CPU grabbed my attention: With what input resolution was this achieved? And are you planning to add GPU support e.g. via cuda? Asking all of those questions because my input is a 1920x1080 video for which I try to achieve real time detection.

Thanks in advance

cansik commented 4 years ago

Thank you very much for your kind words. The project itself is a library for Processing, a creative framework to develop applications. To install the library there, just download the latest release and install it through the application itself.

image

To use the algorithm inside your own java application, I would suggest to add JavaCV as a dependency and copy the code from my repository. To run my dev-examples, I recommend you to install IntelliJ, open the gradle project and run the main classes in the test folder.

The input is 640x480, but it will downscaled to 320x240 to be fed into the neural network. I've just run a test on my new MacBook Pro (2.4 GHz 8-Core Intel Core i9) and there I am able to achieve even about ~45-50 FPS.

At the moment there is no plan to run it on a GPU because the idea of this library is simplicity and portability. So the real-time inference on CPU is my primary goal. Maybe you could even increase the speed on CPU if you convert the model to the openVINO inferencing format. Their inference engine is even faster than the DNN module of OpenCV.

il-easteregg commented 4 years ago

Thanks for your quick reply. I've got Processing up and your code running with constant 20fps. I'll follow your suggestions and get my own application started, also including the 640.onnx model in there (I've snooped around in your code and saw you've prepared this as well): In Python the 640.onnx detected almost all faces in my test video (~26), while the 320.onnx (RFB) right now is only getting a max. of 8.

For sure I will keep following your repo, thanks again for your work and also for your helpful feedback.

cansik commented 4 years ago

@il-easteregg Hmm, even with Processing you should get the speeds I get, what kind of CPU do you use?

Yes I've prepared the 640 solution, but it's not working because there is no simplified onnx model of the 640 model. We would first have to simplify it. On my MacBook, the commands to do the conversion do not work. If you could provide a RFB-640 without Post-Processing ONNX model, I would be really happy.

cansik commented 4 years ago

I've opened an issue in the original repo: https://github.com/Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB/issues/168#issuecomment-600884068

cansik commented 4 years ago

Nevermind, I fixed it and released version 0.3.2 where 640 detectors are included.

il-easteregg commented 4 years ago

Wow, that's awesome. Thanks for implementing!

When running it on my 1920 video, I get to 5fps - which is still impressive, for CPU only (running on a AMD Ryzen 5 1600). Though the detector is not at the same hit rate as Linzaers, I assume its simply because of the downscaling: In Linzaers repo, I scaled the video down to 1280 in width (and 0.5fps on CPU, 12fps on cuda). But I will play around with your src.

By the way the biggest speedup (to 22fps on cuda) I achieved with Linzaers repo, when skipping every 2nd frame in the detection phase, just assuming the faces from the frames before didn't move. Yes, I know, that's kind of cheating - but hardly noticeable in the end result.

il-easteregg commented 4 years ago

Got your repo in my Java env. up and running (took me a while as a Java newbie - in the end it's easier then expected), and removed the downscaling of the image: With downscaling to 1280 the video runs at 4fps. With 1920 as input video and working resolution for ONNX, I get stable 2fps and a full face detection. This would be insanely good, assuming that everything is running on CPU only.

But I have some doubts: When monitoring my GPU (via GPU-Z), I see the GPU and GPU memory clock rising to the max when running Processing. Running Linzaers repo on CPU only, the GPU clock stays almost at 0.

To verify / falsify my doubts, I will try to implement JavaCVs cuda - honestly no clue how to do this atm, but since it's implemented in JavaCV I hope it's in the end also easier then expected, just like getting your repo into my Java env. (In case you can point me in the right direction, that would be highly appreciated though - sorry for bugging you that much)

cansik commented 4 years ago

Which video library do you use for Processing? The one from the contribution manager is quite old and uses gstreamer-0.1. Maybe this is the raise of GPU performance.

Could you share the 1920 anchor point file with me, so I am able to run tests as well?

Regarding JavaCV & Cuda, have a look at: https://github.com/bytedeco/javacpp-presets/pull/832

il-easteregg commented 4 years ago

Sorry for the misunderstanding: I didn't create a new anchor point file, but simply edited line 82 in DeepVision.java from return new ULFGFaceDetectionNetwork(Repository.ULFGFaceDetectorRFB640Simplified.getPath(), 640, 480); to return new ULFGFaceDetectionNetwork(Repository.ULFGFaceDetectorRFB640Simplified.getPath(), 1920, 1080); Keeping the same scale as the original video input enables the Detector to recognize also the smaller faces.

I did more testing, and for some reason now the GPU only gets going for the first couple frames - then slows down to almost 0. Weirdly I noticed this first when switching to the opencv-gpu, but this might be due to my lack of Java knowledge - so I might have it implemented in a wrong way. Later today I will undo this change and re-test, to see if my observation was plain wrong. On my system is GStreamer 1.16.2, but not sure if that is being used or the one delivered with the contribution manager. Can I check this within the Java application? (sorry again for my newbie-style questions. They will keep coming for a while, I guess)

Thanks for the link for JavaCV & Cuda, I will read into this.

il-easteregg commented 4 years ago

Sorry for the delay - I retested with the old setup and the GPU is calming down there as well. So it indeed runs only on the CPU.

In the meantime, OpenCV got updated. And you can bet I was happy to see that you've updated also your files, including the 1.5.3-SNAPSHOT - which should also include the new OpenCV. So I got to work and tried figuring out how Maven works. Got it setup within Visual Studio Code and tried to compile based on your pom.xml - but it fails with the note "Could not find artifact org.bytedeco:ffmpeg:jar:android-arm:4.2.2-1.5.3-20200406.153857-458 in sonatype-snapshots (https://oss.sonatype.org/content/repositories/snapshots/)". I've also tried running mvn -U compile and then additionally mvn -U compile -e for the full trace - but the error remains the same.

Is this a fault on my side? Again, sorry for only sending in questions...

il-easteregg commented 4 years ago

A short update, after trying to get things working whole yesterday and today: Not entirely sure what I am doing here, but I got Maven running successfully - after adding to the pom.xml:

<dependency>
        <groupId>org.processing</groupId>
        <artifactId>core</artifactId>
        <version>3.0a5-SNAPSHOT</version>
</dependency>

And running Maven with mvn -U compile "-Djavacpp.platform=windows-x86_64". Trying to run Gradle build afterwards, it fails still - missing a whole lot of dependencies, e.g. openblas-0.3.9-1.5.3-SNAPSHOT-ios-x86_64.jar. So either Maven or Gradle fail missing files - assuming due to the Snapshot status not all files might be available as builds?!

cansik commented 4 years ago

I think it has to do with the snapshot builds by javacv which are sometimes a bit delayed. It is possible to exclude other dependencies then the necessary for the OS you are running it on in gradle.

il-easteregg commented 4 years ago

Thanks for suggestion. Though I cant figure out the proper way of doing this - I went a bit overboard with manual excludes, but could only achieve a successful Gradle build with those settings:

compile group: 'org.bytedeco', name: 'opencv', version: "4.3.0-$javaCvVersion"
    compile group: 'org.bytedeco', name: 'javacpp', version: "$javaCvVersion", classifier: "windows-x86_64"
    compile group: 'org.bytedeco', name: 'openblas', version: "0.3.9-$javaCvVersion", classifier: "windows-x86_64"
    compile group: 'org.bytedeco', name: 'tesseract', version: "4.1.1-$javaCvVersion", classifier: "windows-x86_64"
    compile group: 'org.bytedeco', name: 'leptonica', version: "1.79.0-$javaCvVersion", classifier: "windows-x86_64"
    compile (group: 'org.bytedeco', name: 'javacv', version: "$javaCvVersion")
    {
        exclude group: 'org.bytedeco', module: 'opencv'
        exclude group: 'org.bytedeco', module: 'javacpp'
    }
    compile (group: 'org.bytedeco', name: 'opencv-platform', version: "4.2.0-$javaCvVersion", classifier: "windows-x86_64")
    {
        exclude group: 'org.bytedeco', module: 'opencv'
        exclude group: 'org.bytedeco', module: 'openblas'
        exclude group: 'org.bytedeco', module: 'javacpp'
    }
    compile (group: 'org.bytedeco', name: 'openblas-platform', version: "0.3.9-$javaCvVersion")
    {
        exclude group: 'org.bytedeco', module: 'openblas'
        exclude group: 'org.bytedeco', module: 'javacpp'
    }
    compile (group: 'org.bytedeco', name: 'tesseract-platform', version: "4.1.1-$javaCvVersion")
    {
        exclude group: 'org.bytedeco', module: 'tesseract'
        exclude group: 'org.bytedeco', module: 'leptonica'
        exclude group: 'org.bytedeco', module: 'javacpp'
    }

This also builds a fatjar - which, of course, fails running in Processing:

UnsatisfiedLinkError: no jniopencv_core in java.library.path
java.lang.UnsatisfiedLinkError: no jniopencv_core in java.library.path
UnsatisfiedLinkError: no jniopencv_core in java.library.path
A library relies on native code that's not available.
Or only works properly when the sketch is run as a 32-bit application.
    at processing.javafx.PSurfaceFX.lambda$0(PSurfaceFX.java:409)
    at java.lang.Thread.run(Thread.java:748)
Could not run the sketch (Target VM failed to initialize).
For more information, read revisions.txt and Help ? Troubleshooting.
cansik commented 4 years ago

Did you change something in the code to use direct opencv features? Usually this error happens if you did not load the native opencv lib:

Loader.load(opencv_java.class)

But maybe it is easier for you to just go back to version 1.5.2 instead using the snapshots. I've just upgraded to fix the superresolution bug in opencv.

cansik commented 4 years ago

@il-easteregg I've just pushed a new commit which adds the stable 1.5.3 version. So for now it is not necessary to use snapshots anymore. Just use gradle :)

il-easteregg commented 4 years ago

Thank you, @cansik - the Gradle build works for me again.

I'll now continue my endeavor to include GPU / CUDA. I've added opencv-platform-gpu and cuda-platform-redist to the build, which forced me to enable zip64 by adding zip64 true in line 48 of build.gradle (since the file size went above 4GB). Another addition I made is in ULFGFaceDetectionNetwork.java, line 59:

        net.setPreferableBackend(DNN_BACKEND_CUDA);
        net.setPreferableTarget(DNN_TARGET_CUDA);

The build runs successful, but then running it all within Processing throws a "CUDA driver version is insufficent" error. I will keep digging, since my CUDA drivers should be fine (at least the ULFG python script ran fine with my setup).

Edit:

JavaCV / OpenCV seems to depend on CUDA 10.2 (Edit 2: Of course, since I included that in the Gradle 🤣 ). After updating my graphics card driver as well as CUDA itself, Processing is without errors.

The good news: My GPU monitor says, the GPU is being constantly used processing the video. The bad news: Almost no change in FPS (running it in high resolution, CPU only achieves 2 FPS while CPU+GPU seem to lift it to 3 FPS)... I might not have implemented it correctly. Yet.

cansik commented 3 years ago

@il-easteregg I have create a CUDA branch as well...maybe it helps you. I am achieving up to 90 FPS with CUDA support on my 2080 TI.