Enormous scale error on Mono_Kitty

ThibaudMZN commented 7 years ago

Hi,

I'm trying to use ORB-SLAM2 to create a GPS alternative fro a moving car.

When running mono_kitty on KITTY08, I should get something kind of ok for the scale (without taking drifting in account, according to this paper -> https://arxiv.org/pdf/1610.06475v2.pdf )

But my scale is really off (the most west points are at -20). Does this mean that implementing ORB-SLAM2 on a monocular camera is not good anymore, compared to the first ORB-SLAM algorithm ?

AlejandroSilvestri commented 7 years ago

ORB-SLAM2 outperforms ORB-SLAM, without drawbacks.

Monocular visual SLAM never gets real scale. You must convert from monocular slam arbitrary reference system to some real world reference system of your choice. Monocular SLAM can't do this for you.

Please show me to the exact point in the paper addressing this subject, I'll be glad to study it.

ThibaudMZN commented 7 years ago

This was my thought as well, why a new version would be less accurate than a previous one...

On the figure 5 of the paper, he is comparing ORB-SLAM Mono and ORB-SLAM2 Stereo, but if i run this specific dataset with ORB-SLAM2 mono, I have a scale which is approximately 200 times smaller than the one he has on the left figure.

AlejandroSilvestri commented 7 years ago

Well, monocular SLAM doesn't get the right scale. Each time ORB-SLAM and ORB-SLAM2 initialize, they pick an arbitrary scale based on the ser of map points they were able to triangulate.

Each time you run ORB-SLAM2 on the same dataset, you "can" have a different scale.

While comparing maps, the researches must manually adjust scale.

This is valid only for monocular. Doesn't happen with stereo nor RGB-D, which get the proper world scale. Figure 5 compares ORB-SLAM, which is only monocular, with ORB-SLAM2 stereo.

ThibaudMZN commented 7 years ago

Thank you for this clean explanation, i'll try to use other sensors to create a world scale at the beginning of the program !

AlejandroSilvestri commented 7 years ago

Let me know how you do that when you succeed!

ThibaudMZN commented 7 years ago

I have an other question for you...

Now that I have setup different types of sensors, I'm able to correct the scale issue pretty easily, I'm doing it as an "after process" when the program is over and I can access the "KeyframesTrajectory.txt" file.

The thing is, I want to implement that directly in the code, do you know where would be the best place to apply transformations to all the Keyframes ? (Basic transformation such as space rotation and magnitude correction).

Thank you !

AlejandroSilvestri commented 7 years ago

@ThibaudMZN ,

I'm glad to hear about this. I myself am working on something like that, but with a different approach.

I assume you want to apply a 7dof transform (translation, rotation and scale).

You should access the sets of keyframes map.mspKeyFrames an mappoints map.mspMapPoints, and rewrite keyframes poses with keyframe.SetPose (passing a 4x4 Mat), and mappoint.SetWorldPos (passing a 3x1 Mat).

You should first pause all threads (LoopClosing and LocalMapping), wait for them to be paused, and only then perform the transformation. Tracking mode pauses LocalMapping, that should be enough.

After that you should actualize mCurrentFrame.mTcw, if you don't ORB-SLAM2 will loose track and perform a relocalization.

RameshKamath commented 4 years ago

Hi @AlejandroSilvestri ,

I am also trying to solve the scaling problem.

I have the 4x4 Rot-Trans matix from another sensor for every frame which works in real world scale.

I was trying to extract scale factor by using inverse matrix of caluculated and multiply it with real world matrix (I am doing this in Tracking::MonocularInitialization to set scale)

but the error is too large.

Please let me know your thoughts on this approach and if you have any different approach for setting correct scale to pose and map points using RT matrix from real world.

AlejandroSilvestri commented 4 years ago

Hi @RameshKamath,

I share some thoughts. The only influence of scale in a euclidean transformation matrix (a rototranslation matrix) is the magnitude of the translation vector submatrix (the right column in the matrix is a translation vector in homogeneous coordinates). So, if you have the same rototranslation matrix with two sensors, and the only difference is the scale, so only this vector magnitude should be different, and from there you can get a factor.

Monocular systems suffer scale drifting, meaning the scale will change during mapping. So you should want to recalculate this factor periodically.

Monocular ORB-SLAM2 doesn't even try to get the right world scale, because it can't. Monocular systems can't do it without some external info. You always will get a difference between orb-slam2 scale and real world scale, but this is not an error.

RameshKamath commented 4 years ago

Hi @AlejandroSilvestri,

Thanks for the reply, I understand that monocular systems are bad with scale and will drift over time.

But my hope was that if i initialize with real scale and all map points are also in real scale, even if it drifts, It won't drift too much and with loop closing from time to time, i can maintain real world scale to everything.

My later plan was to enhance the optimizer and initializer with real world pose data (like a kalman filtering system) to get accurate pose data all the time.

I am currently working on that this approach. any suggetions or pointing to other code with this type of implementation will help.

I am using your sugession of using only Translation of the rototranslation matrix.

if you have any suggestion on what I should modify, so that ORB Slam takes Translation matrix from (other sensor rototranslation matrix) rather than calculating it.

(PS: by looking at code my guess is in optimizers and loop closer has to be modified.)

AlejandroSilvestri commented 4 years ago

@RameshKamath

I think the best place to change world scale is in global bundle adjustment, or directly right during initialization. All subsequent frames will adopt this scale.

ORB-SLAM3, still in beta, is an open implementation combining IMU readings with visual SLAM, like the paper Visual-Inertial and Multi-Map SLAM. They doesn't use a Kalman filter to merge measurement, they put the IMU readings in the graph, in the map itself, so bundle adjustment automatically takes care of scale propagation. The paper Visual-Inertial and Multi-Map SLAM explains this method.

RameshKamath commented 4 years ago

@AlejandroSilvestri,

Thanks for recommending me to ORB SLAM3, I looking into it for inspiration.

I have modified the code so that i could divide real translation's data by calculated translation's data to get scale factor (elementwise devide with zero check) between two.

then i have set the initail frame and current frame with real translation and apply the scale factor to every map point in pKFini->GetMapPointMatches().

But optimizer is optimizing and changing the frame pose back. So i am assuming there is some problem in scaling of map points or other things.

I also tried to give identity matrix to edge information in BundleAdjustment. so that it is less aggressive. but it didn't work.

if you have any suggestion for me to try, please let me know.

AlejandroSilvestri commented 4 years ago

@RameshKamath

You need to apply new poses to keyframes too.

If that still doesn't work, you can always apply a conversion to the output current frame pose: let orb-slam2 use its own scale, you convert the output.

RameshKamath commented 4 years ago

@AlejandroSilvestri

i am applying to keyframes too,

thanks for the suggestion about scaling after worlds, i am trying it now,

As for the progress: I am saving and loading map and relocalizing to the saved map. but the pose is wrong when relocalizing.

my guess is that map points are not scaled correctly and pnp and reinitailization is giving the wrong pose.

i am using the below code i use to scale. (P.S. I am only scaling translation.)

    // setting initial frame to RT from external for scale
    cv::Mat initial = pKFini->GetPose();
    if (!mInitialFrame.mImu.empty()) {  // in Frame.mImu i am saving that frame's RT matrix
        initial.at<float>(0, 3) = mInitialFrame.mImu.at<float>(0, 3);
        initial.at<float>(1, 3) = mInitialFrame.mImu.at<float>(1, 3);
        initial.at<float>(2, 3) = mInitialFrame.mImu.at<float>(2, 3);
    }
    pKFini->SetPose(initial);

    // extracting scale from difference in initail and current frame, so that it could be used toscale map points 
    // if saved map(aka, map points and keyframesin database are in real world scale) then relocalization will be in smae scale
    cv::Mat Tcw2 = pKFcur->GetPose();
    //Tcw2.col(3).rowRange(0,3) = Tcw2.col(3).rowRange(0,3)*invMedianDepth;
    cv::Mat ndiff = cv::Mat::ones(3, 1, CV_32F);
    if (!mCurrentFrame.mImu.empty() && !mInitialFrame.mImu.empty()) {
        if (Tcw2.at<float>(0, 3) != 0)
            ndiff.at<float>(0, 0) = mCurrentFrame.mImu.at<float>(0, 3) / Tcw2.at<float>(0, 3);
        if (Tcw2.at<float>(1, 3) != 0)
            ndiff.at<float>(1, 0) = mCurrentFrame.mImu.at<float>(1, 3) / Tcw2.at<float>(1, 3);
        if (Tcw2.at<float>(2, 3) != 0)
            ndiff.at<float>(2, 0) = mCurrentFrame.mImu.at<float>(2, 3) / Tcw2.at<float>(2, 3);
    }

    Tcw2.at<float>(0, 3) = Tcw2.at<float>(0, 3) * ndiff.at<float>(0, 0);
    Tcw2.at<float>(1, 3) = Tcw2.at<float>(1, 3) * ndiff.at<float>(1, 0);
    Tcw2.at<float>(2, 3) = Tcw2.at<float>(2, 3) * ndiff.at<float>(2, 0);

    // setting current frame to RT from other sensor
    pKFcur->SetPose(Tcw2);

    // To get all map points  try both//pKFini->GetMapPointMatches  //mpMap->GetAllMapPoints
    // scale the map points with scale factor from Real sorld RT matrix
    vector<MapPoint*> vpAllMapPoints = pKFini->GetMapPointMatches();
    for(size_t iMP=0; iMP<vpAllMapPoints.size(); iMP++)
    {
        if(vpAllMapPoints[iMP])
        {
            MapPoint* pMP = vpAllMapPoints[iMP];
            cv::Mat wpos = pMP->GetWorldPos();
            //wpos = wpos*invMedianDepth;
            wpos.at<float>(0) = wpos.at<float>(0) * ndiff.at<float>(0);
            wpos.at<float>(1) = wpos.at<float>(1) * ndiff.at<float>(1);
            wpos.at<float>(2) = wpos.at<float>(2) * ndiff.at<float>(2);
            pMP->SetWorldPos(wpos);
        }
    }

if you have any suggestions please let me know.

AlejandroSilvestri commented 4 years ago

@RameshKamath

You should get only one scale factor from translation, not three. You are calculating three independent scale factors for each axis, this stretches your space.

If orb-slam2 and imu translation points to very different directions, then you have a different problem, it's not only about scale anymore, you need a rotation too.

RameshKamath commented 4 years ago

@AlejandroSilvestri

Thanks for the Suggestion.

Now i am calculating Euclidean Distance for Calculated and Real Translation data and finding scaling factor from them.

i am also applying this method to TrackReferenceKeyFrame function for fixing scale drift.

Is there anywhere else i have to apply to fix scale drift.

if you have any suggestion for me to try, please let me know.

AlejandroSilvestri / ORB_SLAM2-documented

Enormous scale error on Mono_Kitty #1