bytedeco / javacv

Java interface to OpenCV, FFmpeg, and more
Other
7.5k stars 1.58k forks source link

Unexpected runtime errors while creating detectors or detecting objects #777

Open csolorio opened 7 years ago

csolorio commented 7 years ago

Hi, everyone.

I'm trying to use several haar cascades classifiers simultaneously. Sometimes, after several classifier object creations or detections, the JVM crashes. It's unpredictable (since I don't know when the error is going to appear), but it always happens during the detection using a specific XML (sometimes, the detection is done with no problems). What could be possible causes?

The usual error messages in the JVM crash logs are:

Stack: [0x00000000024d0000,0x00000000025d0000], sp=0x00000000025ce1d0, free space=1016k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [opencv_objdetect320.dll+0x17997] C [opencv_objdetect320.dll+0x2bb53] C [opencv_objdetect320.dll+0x16319] C [opencv_objdetect320.dll+0x1483e] C [opencv_objdetect320.dll+0x14288] C [jniopencv_objdetect.dll+0x1d4ff] C 0x0000000002777f74

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.bytedeco.javacpp.opencv_objdetect$CascadeClassifier.detectMultiScale3(Lorg/bytedeco/javacpp/opencv_core$Mat;Lorg/bytedeco/javacpp/opencv_core$RectVector;Ljava/nio/IntBuffer;Ljava/nio/DoubleBuffer;DIILorg/bytedeco/javacpp/opencv_core$Size;Lorg/bytedeco/javacpp/opencv_core$Size;Z)V+0 j CogentMX.FeatureDetector.detectObject(ILorg/bytedeco/javacpp/opencv_core$Mat;Lorg/bytedeco/javacpp/opencv_core$Size;Lorg/bytedeco/javacpp/opencv_core$Size;)LCogentMX/DetectionsContainer;+79 j CogentMX.FeatureDetector.detect(ZZ)LCogentMX/DetectionsContainer;+208 j TestApp.main([Ljava/lang/String;)V+94 v ~StubRoutines::call_stub

Any suggestion?

saudet commented 7 years ago

You'll need to provide more details if you expect someone to help.

csolorio commented 7 years ago

Ok, that's kind!

Let me paint the whole functional picture. I have an array of CascadeClassifiers that I create at some point and then reuse them when I need them. The XML trained models are organized in different folders, I explore the folder tree, get all the paths and create all cascade classifiers like this:

         //Explore and get paths of XML for cascade classifiers into 'models' list
         exploreModelsDirectory();

        //Create detector objects
        detectors = new opencv_objdetect.CascadeClassifier[models.size()];
            for(int i = 0; i < detectors.length; i++){
                if(detectors[i] == null) {
                    detectors[i] = createDetector(models.get(i).getKey());
                }
            }
            detector = detectors[currentDetector];`

Now, detector is a global variable that points to some detector within an array and currentDetector is an index that gets incremented/decremented whenever I want to change between classifiers.

The way I create a classifier is:

    private opencv_objdetect.CascadeClassifier createDetector(String detectorPath){
        opencv_objdetect.CascadeClassifier detector;
        try{
            detector = new opencv_objdetect.CascadeClassifier(detectorPath);
            return detector;
        }
        catch(Exception ex){
            printDebug(LOG_ERRORS, "Creation failed (" + detectorPath + "):" + ex.getLocalizedMessage());
            return null;

        }
    }

I have found some problematic XML files so I just don't use them anymore. But sometimes, when I perform several detections with different cascade classifiers, the JVM crashes with the error message I posted in the first post. I perform an individual detection this way

    private DetectionsContainer detectObject(int objectLimit, opencv_core.Mat image, opencv_core.Size minSize, opencv_core.Size maxSize){
        DetectionsContainer detections = new DetectionsContainer(objectLimit);

        //Detect primary image
        printDebug(LOG_INFO, "Model used: " + models.get(currentDetector));
        try {
            detector.detectMultiScale3(image,
                    detections.getROIs(), detections.getRejections(), detections.getLevelWeights(),
                    scaleFactor, minNeighbors, 0, minSize, maxSize, true);

        } catch(Exception ex){
            printDebug(LOG_ERRORS, "Detection failed: " + ex.getLocalizedMessage());
        }

        return detections;
    }

where a DetectionsContainer is just a class that encapsulates a RectVector, an IntBuffer and DoubleBuffer.

So, the 'special behavior' that I perform but I don't think it has anything special for it to fail so predictably random (knowing in what part is going to fail but not knowing exactly when that error is going to appear) is:

primary = detectObject(faceLimit, image, minSize, null);
while(iterateDetectors()) // -- USE NEXT DETECTOR, RETURNS FALSE IF THERE ARE NO MORE DETECTORS AVAILABLE
{
    secondary  = detectObject(faceLimit, image, minSize, null);
    printDebug(LOG_INFO, String.format("Detected %s new " + target, secondary.size()));
    mainObjectsFound += secondary.size();

    //Merge main results with secondary ones
    primary = mergeDetections(primary, secondary, status);
}

Silly changes like reordering the console printing messages change the moment when the application crashes (while using other classifier for example), but the application always crashes in the same moment (while untouched, obviously).

The behavior seems to suggest that it's some kind of memory issue when detectMultiScale3 is invoked several times with different cascade classifiers. I have removed the mergeDetections function in further testing and the exact same error remains (just to discard that whatever I did in that function was the cause. People tend to blame the unknowns). And it's always the same error messages:

  1. JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01) Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode windows-amd64 compressed oops) Problematic frame: C [opencv_objdetect320.dll+0x17959]

  2. JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01) Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode windows-amd64 compressed oops) Problematic frame: C [opencv_objdetect320.dll+0x17907]

With slightly different memory addresses.

Any suggestion?

saudet commented 7 years ago

Could try to narrow down the issue by reproducing the crash with only a few simple lines?

csolorio commented 7 years ago

Sure, Should I attach the XML files too? Another thing, is there any significant difference between using CvHaarClassifierCascade and CascadeClassifier? Would that help?

The minimal code that crashes sometimes...

    opencv_core.Mat image = imread(imagePath);
    File folder = new File(xmlFolder);
    FilenameFilter xmlFilter = (dir, name) -> name.toLowerCase().endsWith(".xml");
    List<opencv_objdetect.CascadeClassifier> detectors = new ArrayList<>();

    File[] classifiers = folder.listFiles(xmlFilter);
    for(File classifier : classifiers){
        detectors.add(new opencv_objdetect.CascadeClassifier(classifier.getAbsolutePath()));
    }

    int size = 100;
    opencv_core.RectVector rois = new opencv_core.RectVector(size);
    IntBuffer rejections = IntBuffer.allocate(size);
    DoubleBuffer weights = DoubleBuffer.allocate(size);

    for(opencv_objdetect.CascadeClassifier detector : detectors){
        detector.detectMultiScale3(image, rois, rejections, weights, scaleFactor, minNeighbors,
                0,
                minSize, null, true);
    }

Edit 1: I have discovered something. When the size variable is decreased, the probability of the error appearing decreases too. But still, it appears sometimes.

Edit 2: When I use this:

opencv_core.RectVector rois = new opencv_core.RectVector();
IntBuffer rejections = IntBuffer.allocate(1);
DoubleBuffer weights = DoubleBuffer.allocate(1);

Until now I can't replicate the sudden crash error. This causes other errors, though. Like having, for example, 3 detections but the int and double buffers have only one place allocated.

Edit 3: If I include any flag, the app crashes even faster.

saudet commented 7 years ago

Does it change anything if you use IntPointer and DoublePointer for rejections and weights?

csolorio commented 6 years ago

Hi, Samuel.

Sorry I couldn't check your advice sooner. It still crashes randomly. I'm using this modified code. Am I using IntPointer correctly? Do I need to invoke some deallocation code?

        int size = 20;

        for(opencv_objdetect.CascadeClassifier detector : detectors){

            opencv_core.RectVector rois = new opencv_core.RectVector();
            IntPointer rejections = new IntPointer(size);
            DoublePointer weights = new DoublePointer(size);
            detector.detectMultiScale3(image, rois, rejections, weights, scaleFactor, minNeighbors,
                    CV_HAAR_DO_CANNY_PRUNING,
                    minSize, null, true);
            int roisSize = (int)rois.size();
            System.out.print("Detected: " + roisSize);

            try {
                for (int i = 0; i < roisSize; i++) {
                    System.out.print(" " + rejections.get(i) + ", " + weights.get(i));
                }
                System.out.println();
            } catch(Exception _){
                System.out.println("fail");
            }
        }

The error message when the JVM crashes is (summarized)

# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000007feca507907, pid=9992, tid=0x000000000000210c
#
# JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# C  [opencv_objdetect320.dll+0x17907]
saudet commented 6 years ago

Looks alright, so this seems like this could be a bug in OpenCV. We usually create detectors once and reuse them afterwards, so it might very well be that no one has encountered that problem yet.

saudet commented 6 years ago

Could you try to call detector.deallocate() when you are done with each of them? It is possible that the destructors are not thread safe and that could cause problems if we don't call them at a known point in time...

csolorio commented 6 years ago

Sure. I could, but since I create several detectors and reuse them for a "long period of time", the errors come up way before I deallocate any of them.

saudet commented 6 years ago

I see, would be good to confirm that we can't reproduce that outside Java...

saudet commented 6 years ago

To make this bug more reproducible, it might help to call rois.deallocate(); rejections.deallocate(); weights.deallocate() after at the end of loop. Let me know if you get any interesting results that way.

csolorio commented 6 years ago

I tested it with the deallocation but I don't see any discernible difference. It still crashes randomly. I'll try to reproduce this in C++. Will it also help if I test the java wrapper provided by opencv?

One of the things I've been able to see is that the weights and rejections filled by the detectMultiScale3 method are different in some cases even though the same photo and the same XML file are provided. The problem arises when a classifier detects several objects (from all the tests I've done, the classifier starts to behave odd when it detects at least 5 or 6 objects).

As an example. I create 8 classifiers. I supply the same photo to each classifier. So I can reproduce the random crash, I repeat this 10, 20, 30 or so times.

The detections, weights and rejections pairs in the first run, for each classifier are: Detected 2 objects: <24, 2.43913334608078><24, 1.185618620365858> Detected 2 objects: <30, 2.773367229383439><30, 3.469111846294254> Detected 2 objects: <20, 5.58269025105983><20, 2.206179869361222> Detected 2 objects: <20, 3.3494670931249857><20, 2.7936345124908257> Detected 6 objects: <3915040, 0.0><1060602865, 7.583211544235048E242><1061647676, 8.545232804520467E194><686489680, 7.16E-322><686489680, 0.0><542315392, 0.0> Detected 10 objects: <1060602865, 7.583211544235048E242><605159456, 0.0><3915040, 0.0><656474192, 0.0><288676234, 0.0><656474192, 0.0><714408016, 0.0><3915040, 0.0><196604, 0.0><0, 2.2250738585072014E-308> Detected 1 object: <1060602865, 7.583211544235048E242> Detected 1 object: <1061647676, 8.545232804520467E194>

The numbers start looking too weird when 6 objects are detected. In the next loop, the same classifiers are used and the same photo is provided: Detected 2 objects: <24, 2.43913334608078><24, 1.185618620365858> Detected 2 objects: <30, 2.773367229383439><30, 3.469111846294254> Detected 2 objects: <20, 5.58269025105983><20, 2.206179869361222> Detected 2 objects: <20, 3.3494670931249857><20, 2.7936345124908257> Detected 6 objects: <200, 3.349467093125006><1061917808, 9.646442850553718E227><1061865397, 2.513047756722869E180><2006328496, 0.0><288676318, 0.0><7798895, 0.0> Detected 10 objects: <2006328496, 0.0><565772320, 0.0><567079696, 1.5597531407224497E-4><288676300, 0.0><748617808, 0.0><3915040, 0.0><2006328496, 0.0><6553710, 0.0><3145776, 0.0><7274588, 0.0> Detected 1 objects: <2006328496, 0.0> Detected 1 objects: <2006328496, 0.0>

All values remain the same in the results of the first 4 classifiers but start to change too much in the next. This behavior remains in the rest of the iterations.

saudet commented 6 years ago

Only the weights and rejections look weird? Not the ROIs? Does it help if you initialize them with IntPointer(null) and DoublePointer(null) instead?

csolorio commented 6 years ago

The ROIs are all consistent, it's just the rejection and weight values that start to diverge too much.

Supplying null to the Int/DoublePointer constructor doesn't compile for me... But providing no parameters seems to have worked so far. Doing the same with Int/DoubleBuffer didn't work. The values remain mostly consistent. Those that change vary but within an acceptable small range... And hasn't crashed so far. Seems like this is the magic combination!

Edt: I just did 300 iterations with no crash! Seems like it's fixed!

saudet commented 6 years ago

I mean IntPointer((Pointer)null) and DoublePointer((Pointer)null)?

csolorio commented 6 years ago

Just tested it. Works so far too. Thanks!

saudet commented 6 years ago

Ok, thanks! So it could be a bug in either JavaCPP or OpenCV... In any case, we have a workaround for now.

saudet commented 6 years ago

I've just released version 1.4, which is based on OpenCV 3.4.0. Could you check if this issue still happens? Thanks for the feedback!