Closed antimodular closed 9 years ago
It should be possible. I you create 5 instances of ofxCIDetector and call ofxCIDetector::setup
from the same thread you call ofxCIDetector::detectFaceFeatures
. I don't think calling ofxCIDetector::detectFaceFeatures
on different threads with one main ofxCIDetector
will work but it might? The thing is apple's CIDetector might be using a glContext underneath and you can only have one GLContext per thread.
Let me know if you get any errors. If you do I should be able to make it work by creating and attaching new CIContext each time you call ofxCIDetector::setup
right now am I calling this function [CIDetector detectorOfType:CIDetectorTypeFace context:nil options
with the context option set to nil. Apple does not give any documentation on what happens when you pass nil for context. But I do know that each thread should have its own context to be safe.
thanks for that. i got something working here: https://github.com/antimodular/coreImageDetector this code uses 5 different threads, each opens a stream to a usb cam and has it's own instance of ofxCIDetector and detectFaceFeatures. the frame rate drops down to 6 fps, more or less the same fps i get without threading it. i uploaded the code slightly modified so that is opens 5 streams to the same webcam, since not everyone has 5 cams available.
Ok I will look at over the weekend. but just glancing at the code, all the detectors are still being setup in the main thread. Move the setup stuff inside void threadedFunction()
also if you comment out your ofApp::update()
and ofApp::draw()
is it still slow? also if you remove the call to detector.detectFaceFeatures
how much does the frame rate increase.
also can you try having one just ofxCIDetector
and call detector.detectFaceFeatures
on 5 different threads. Probably wont work but it is worth a shot.
thanks for taking the time. all i am doing in ofApp.cpp setup is check how many webcams are available, allocating 5 textures and initializing the 5 different threads. the rest is done inside the ThreadedObject.h.
i am not sure how to pass data to ofxCIDetector when it is not inside the same thread.
i ran the code with detector.detectFaceFeatures removed and it goes up to 60fps.
i also made an app not using ofxCIDetector but rather ofxCV which gave me about 30fps for the same work load.
I must admit i gave up trying to multithread ofxCoreImageDetector because it worked well with ofxCV. But i must also say that ofxCoreImageDetector seems much faster than ofxCV when looking for multiple faces in just one camera stream. All that to say, i am very happy you made this addon.
Hey Ahbee.
I have a new test app that threads ofxCoreImageDetector. The tracking works well and i even get good frame rates for multiple cams. BUT after 3 min or running a malloc error happens all the time.
here is the OF 0.8.4 project: https://www.dropbox.com/sh/bx7rfcu3gwlfklg/AAAzzt2oCz4gOI8_ZyxW0Ljna?dl=0
i followed these instructions to get more info about the origin of the error: http://stackoverflow.com/questions/6969407/set-malloc-error-break-in-xcode-4
Most of the time it points to this line in your addon:
CIImage* ofxCIDetector::CIImageFrom(const ofImage &img){
ofImage srcImage = img;
srcImage.setImageType(OF_IMAGE_COLOR_ALPHA);
here a screenshot of the error: https://www.dropbox.com/s/zscozuxw5k8n8ea/Screen%20Shot%202015-08-18%20at%204.30.46%20PM.png?dl=0
i hope you can help with this problem.
Hey this error looks like you ran out memory, I think for 32 bit apps the limit is around 3 GB. I will profile this to see if there are any memory leaks. And sorry I could not get to your other example yet, I was really busy at work. I will take a look this weekend I promise
It would be great if you can look at it. The memory usage reported by Xcode is usually just around 90 MB.
It looks look each thread needs an Autorelease pool try this. This needs to be objectivec++ so make sure all your files are .mm. you can check the file type the utilities inspector.
@autoreleasepool {
if(bUseFeatures) detectedFaces = detector.detectFaceFeatures(cam_image, true, true);
else detectedFaces = detector.detectFaceFeatures(cam_image, false, false);
facesDetectedAmt = detectedFaces.size();
for (int n = 0; n < MAX_FACES_PER_CAM; n++) {
faceFoundRect[n] = ofRectangle(0,0,0,0);
}
if(facesDetectedAmt > 0) {
for (int n = 0; n < facesDetectedAmt; n++) {
shared_ptr<ofxCIFaceFeature> &f = detectedFaces[n];
trackID[n] = f->getTrackingID();
faceFoundRect[n] = ofRectangle( f->getBounds());
}
}
}
you should also set up your detector on the thread its being used like. Any speed difference?
void threadedFunction()
{
while(isThreadRunning())
{
if (!bIsDetectorSetup) {
bIsDetectorSetup = true;
detector.setup(OFX_ACCURACY_HIGH, true, .1)
}
Let me know if this worked for you. I am going to try one more thing to speed it up, which is create separate Ci Context for each detector that is being used.
Thanks. I implemented both suggested changes; setup the detector in the threadedFunction and not in the setup function, added the @autoreleasepool and made all .cpp file .mm.
still get the crash after about 1.30 min. see here: https://www.dropbox.com/s/97m11jt2t35k4h8/CI_2.mov?dl=0
in the video. around 18 seconds i am not seeing the @ autoreleasepool
? ,can you send me the code with changes , or make a git repo. On my end the code does not crash after I added the autoreleasepool (I ran it for 30 min). I'll try to get an extra camera to test, since I am only using my built in webcam.
you are right. i had the @autoreleasepool in the wrong location Here:
void threadedFunction()
{
@autoreleasepool{
while(isThreadRunning())
{
i changed it now to how you ave it and it seems to run stable. thank you so much.
i will test multiple cams later today and report back on the fps.
Hey Ahbee. Have you had a chance to see if there is anything you can go to get a better fps? The code runs at:
thanks, stephan.
have not tested with multiple cameras yet but I just added a branch called multiContext, you can try that with your example and let me know the issues. You have to call setup
inside the threaded function for this to work .setup now takes an extra parameter at the end, set it to true
, like
if (!bIsDetectorSetup) {
bIsDetectorSetup = true;
detector.setup(OFX_ACCURACY_HIGH, true, .1,true);
}
can you also send me the new code so I can test it?
thanks for that. using the multiContext branch still gives me 8 fps with 4 cameras running. i wonder if using the openMP approach of not putting things in different threads my give us better results. thanks
thanks for that. using the multiContext branch still gives me 8 fps with 4 cameras running. i wonder if using the openMP approach of not putting things in different threads my give us better results. thanks
Can you remove all your locks and unlocks and report back the frame rate? I Just want to make sure it is not something else that is slow. Ill try to get an extra camera soon to figure this out. Can you post the code you have now? I just wanna make sure you have everything right.
i am at home now and don't have more then 2 cams here. but with 2 cams i now get 60 fps after removing the lock and unlock calls inside ofApp update(). looks promissing.
here my latest code: https://www.dropbox.com/sh/mcz2l90uq6vctxt/AAC6lTF6924YLawLM5d-4ac0a?dl=0
Also if you switch back to the original master branch do your frame rates change?
both branches are running now at 33 fps with 5 cams at half HD. so the main gain was by removing the lock and unlock commands. it's so cool.
do you think it is possible to use your detection in a separate thread? i am working with 5 simultaneous webcam streams and 5 ofxCoreImageDetector instances, which brings my fps to 6.
btw. thanks for this great addon it works like magic.