TadasBaltrusaitis / OpenFace

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.
Other
6.72k stars 1.83k forks source link

AU detection Bug in FeatureExtraction.exe #823

Open mayun123 opened 4 years ago

mayun123 commented 4 years ago

Hello, I found that FeatureExtraction.exe has a negative value when performing visualization, but AU should be between 0 and 5, and a negative value appears in the saved csv file image image

mayun123 commented 4 years ago

And, I use FeatureExtraction.exe and OpenFaceOffline to process the same video .But get a different AU regression, and OpenFaceOffline processing does not show negative AU regression in csv, it seems that the AU regression obtained by FeatureExtraction.exe is incorrect I tested FaceLandmarkImg.exe and got AU regression between 0 and 5. I think FeatureExtraction.exe should do the same

TadasBaltrusaitis commented 4 years ago

Hi,

Did you quit the application before it was done? As a final step to Action Unit extraction OpenFace performs adjustment of AU predictions over the entire sequence (in essence callibrating to a user, and making sure that there are no predictions below 0 and above 5 and shifting the predictions a bit as well).

mitchelkappen commented 2 years ago

Hi,

I'm experiencing same problems when using .exe in the commandline (Python /w subprocess) with the following command:

subprocess.run(['D:\UGent\Topics\Facial\OpenFace_2.2.0_win_x64_new\FeatureExtraction.exe', '-f', file, '-out_dir', processed_dir, "-2Dfp", "-3Dfp", "-pdmparams", "-pose", "-aus", "-gaze", "-hogalign"])

I visualized it and about 25% of my files have values lower than zero, the rest seems to have been processed correctly. The command is used in a loop so everything should be similar, up to a certain extent. Only thing I could think of is the files being .webm files with different codecs, but I don't see a clear pattern here right away.

Any clue what this could be, or how to best tackle this issue?

Thanks in advance!

mitchelkappen commented 2 years ago

I wanted to follow up on this because I have additional information that is further confusing me. I have selected one of the files that gave negative outputs and reran the exact same command to compare the outputs, to make sure something fishy hasn't happened while running the first time. It turns out, my output is now different, here is a short snippet: image On the left is the new file, on the right is the initial file.

As you can already see, this is fairly different. It made me wonder whether it perhaps just 'forgot' to adjust the scores as @TadasBaltrusaitis proposed earlier as if it just didn't rescale it would give perfect correlations. For the same columns, the following correlations are found: image

Which to me is a pretty weird result, considering some are close to perfectly correlated, but some are not at all. In addition, when comparing the actual values, something like 1 in every 50 is an exact match.

This inconsistency made me doubt the stability of the program, so I re-ran the exact same command once more to see what it would yield. This gave me identical results, so it was consistent there. I have also added the output csv's for the first execution (video.csv) and the latter (video_new.csv). This is just one of the files, but it occurs more often. I feel it is safest to just re-run all the files I encountered negative values, however, it makes me doubt all the output files..

If any explanation occurs it would be welcome as I would prefer not re-running all the files as it is close to 1500 vids of 6-10 minutes each.

Thanks again for all the great work!

Files:

TeddyAlbina commented 1 year ago

Did you quit the application before it was done? As a final step to Action Unit extraction OpenFace performs adjustment of AU predictions over the entire sequence (in essence callibrating to a user, and making sure that there are no predictions below 0 and above 5 and shifting the predictions a bit as well).

What is the name of the function to be called to do this ? I'm integrating openface to aws lambda and our last problem is the fact that it emits negative aus. The only functions that appear to normalize Aus are for online mode, but as you can see we use a sequence of images and static aus.

Bellow our code

void ProcessVideo(const char frameDirectory[], const char modelLocation[], bool multiView, VideoExResult result[]) {
    std::vector<string> arguments = {
        "/mnt/models/dummy.dll",
        "-mloc",
        std::string(modelLocation),
        "-pose",
        "-aus",
        "-gaze",
        "-au_static",
        "-multi_view",
        multiView ? "1" : "0"
    };

    std::vector<string> sequenceArgument = {
        "-fdir",
        std::string(frameDirectory)
    }; 

    std::vector<string> arguments2 = {
        "/mnt/models/dummy.dll",
        "-mloc",
        std::string(modelLocation),
        "-pose",
        "-aus",
        "-gaze",
        "-au_static",
        "-multi_view",
        multiView ? "1" : "0"
    };

    LandmarkDetector::FaceModelParameters parameters(arguments);
    LandmarkDetector::CLNF landmarkDetector(parameters.model_location); 
    FaceAnalysis::FaceAnalyserParameters face_analysis_params(arguments2);
    FaceAnalysis::FaceAnalyser faceAnalyzer(face_analysis_params); 

    Utilities::SequenceCapture sequence_reader;

    if (!sequence_reader.Open(sequenceArgument)) {
        throw std::runtime_error("Unable to access images directory.");
    } 

    std::vector<VideoExResult> openfaceResults;
    std::vector<double> neg;

    int position = 0;

    auto captured_image = sequence_reader.GetNextFrame();

    while (!captured_image.empty()) {
        std::cerr << "Processing image : " << std::to_string(position) << std::endl;

        // Converting to grayscale
        cv::Mat_<uchar> grayscale_image = sequence_reader.GetGrayFrame();

        // The actual facial landmark detection / tracking
        bool detection_success = LandmarkDetector::DetectLandmarksInVideo(captured_image, landmarkDetector, parameters, grayscale_image);

        // Gaze tracking, absolute gaze direction
        cv::Point3f gazeDirection0(0, 0, 0);
        cv::Point3f gazeDirection1(0, 0, 0);
        cv::Vec2d gazeAngle(0, 0);

        if (detection_success) {
            std::cerr << "Land mark detected" << std::endl;
        }

        if (detection_success && landmarkDetector.eye_model)
        {
            std::cerr << "Computing gaze" << std::endl;

            GazeAnalysis::EstimateGaze(landmarkDetector, gazeDirection0, sequence_reader.fx, sequence_reader.fy, sequence_reader.cx, sequence_reader.cy, true);
            GazeAnalysis::EstimateGaze(landmarkDetector, gazeDirection1, sequence_reader.fx, sequence_reader.fy, sequence_reader.cx, sequence_reader.cy, false);
            gazeAngle = GazeAnalysis::GetGazeAngle(gazeDirection0, gazeDirection1);
        }

        std::cerr << "Add next frame" << std::endl;
        faceAnalyzer.AddNextFrame(captured_image, landmarkDetector.detected_landmarks, landmarkDetector.detection_success, 0, false);

        std::cerr << "Computing pose" << std::endl;

        // Work out the pose of the head from the tracked model
        cv::Vec6d pose_estimate = LandmarkDetector::GetPose(landmarkDetector, sequence_reader.fx, sequence_reader.fy, sequence_reader.cx, sequence_reader.cy);

        std::cerr << "Computing action units" << std::endl;

        auto actionUnits = faceAnalyzer.GetCurrentAUsReg();

        result[position] = VideoExResult();
        result[position].frame = position;
        result[position].faceDetected = PersonDetectionState::One;
        result[position].pose1 = pose_estimate[0];
        result[position].pose2 = pose_estimate[1];
        result[position].pose3 = pose_estimate[2];
        result[position].pose4 = pose_estimate[3];
        result[position].pose5 = pose_estimate[4];
        result[position].pose6 = pose_estimate[5];
        result[position].gaze1 = gazeAngle[0];
        result[position].gaze2 = gazeAngle[1];

        if (!actionUnits.empty()) {

            if (actionUnits[0].second < 0) {
                neg.push_back(actionUnits[0].second);
            }

            result[position].aU1 = actionUnits[0].second;
            result[position].aU2 = actionUnits[1].second;
            result[position].aU4 = actionUnits[2].second;
            result[position].aU5 = actionUnits[3].second;
            result[position].aU6 = actionUnits[4].second;
            result[position].aU7 = actionUnits[5].second;
            result[position].aU9 = actionUnits[6].second;
            result[position].aU10 = actionUnits[7].second;
            result[position].aU12 = actionUnits[8].second;
            result[position].aU14 = actionUnits[9].second;
            result[position].aU15 = actionUnits[10].second;
            result[position].aU17 = actionUnits[11].second;
            result[position].aU20 = actionUnits[12].second;
            result[position].aU23 = actionUnits[13].second;
            result[position].aU25 = actionUnits[14].second;
            result[position].aU26 = actionUnits[15].second;
            result[position].aU45 = actionUnits[16].second;
        }
        else {
            std::cerr << "Action unites is empty" << std::endl;
        }

        position++;

        // Grabbing the next frame in the sequence
        captured_image = sequence_reader.GetNextFrame();
    }

    sequence_reader.Close();
}

Thanks for your help