UZ-SLAMLab / ORB_SLAM3

ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM
GNU General Public License v3.0
6.64k stars 2.57k forks source link

Find keypoints in source for a mask implementation #859

Closed Styazoua closed 4 months ago

Styazoua commented 10 months ago

Hello,

I am trying to implement the use of a mask where I want do drop detected keypoints which are inside of the defined mask For testing I created mask with size of my image. I filled half of the mask with 0 and the other half != 0.

The code below is an implementation in ORBextractor.cc inside ORBextractor::ComputeKeyPointsOctTree() Here for I'm taking the points from vKeysCell which are the output from the FAST corner detector.

When the frame gets displayed the whole image if full of keypoints. My expectation was only half of the frame. I am not sure if I'm checking at the correct position. Can someone help here? Thank you

FAST(mvImagePyramid[level].rowRange(iniY,maxY).colRange(iniX,maxX),vKeysCell,iniThFAST,true);
void ORBextractor::ComputeKeyPointsOctTree(vector<vector<KeyPoint> >& allKeypoints, const cv::Mat &mask)
    {
        allKeypoints.resize(nlevels);

        const float W = 35;

        for (int level = 0; level < nlevels; ++level)
        {
            const int minBorderX = EDGE_THRESHOLD-3;
            const int minBorderY = minBorderX;
            const int maxBorderX = mvImagePyramid[level].cols-EDGE_THRESHOLD+3;
            const int maxBorderY = mvImagePyramid[level].rows-EDGE_THRESHOLD+3;

            vector<cv::KeyPoint> vToDistributeKeys;
            vector<vector<KeyPoint> > tmpAllKeypoints;
            vector<KeyPoint> tmpKeypoint;
            vToDistributeKeys.reserve(nfeatures*10);

            const float width = (maxBorderX-minBorderX);
            const float height = (maxBorderY-minBorderY);

            const int nCols = width/W;
            const int nRows = height/W;
            const int wCell = ceil(width/nCols);
            const int hCell = ceil(height/nRows);

            for(int i=0; i<nRows; i++)
            {
                const float iniY =minBorderY+i*hCell;
                float maxY = iniY+hCell+6;

                if(iniY>=maxBorderY-3)
                    continue;
                if(maxY>maxBorderY)
                    maxY = maxBorderY;

                for(int j=0; j<nCols; j++)
                {
                    const float iniX =minBorderX+j*wCell;
                    float maxX = iniX+wCell+6;
                    if(iniX>=maxBorderX-6)
                        continue;
                    if(maxX>maxBorderX)
                        maxX = maxBorderX;

                    vector<cv::KeyPoint> vKeysCell;

                    FAST(mvImagePyramid[level].rowRange(iniY,maxY).colRange(iniX,maxX),
                         vKeysCell,iniThFAST,true);

                    if(vKeysCell.empty())
                    {
                        FAST(mvImagePyramid[level].rowRange(iniY,maxY).colRange(iniX,maxX),
                             vKeysCell,minThFAST,true);
                    }

//################################################################################

                    if(!vKeysCell.empty())
                    {
                        for(vector<cv::KeyPoint>::iterator vit=vKeysCell.begin(); vit!=vKeysCell.end();vit++)
                        {
                            (*vit).pt.x+=j*wCell;
                            (*vit).pt.y+=i*hCell;

                            if(mask.empty())
                            {
                                vToDistributeKeys.push_back(*vit);
                            }
                            else
                            {
                                // check value of mask cell with coords of keypoint
                                int value = mask.at<int>((*vit).pt.y, (*vit).pt.x);
                                if(value == 0)
                                {
                                    cout << "Point not inside mask. Value = " << value << endl;
                                    cout << "vKeysCell size: " << vKeysCell.size() << endl;
                                    cout << "Position of point: (" << (*vit).pt.x << "/" << (*vit).pt.y << ")" << endl;
                                    vToDistributeKeys.push_back(*vit);
                                }
                                else
                                {
                                    cout << "Removing point. Point inside mask. Value = " << value << endl;
                                    cout << "Position of point: (" << (*vit).pt.x << "/" << (*vit).pt.y << ")" << endl;
                                }
                            }
                        }
                    }

                }
            }

//################################################################################

            vector<KeyPoint> & keypoints = allKeypoints[level];
            keypoints.reserve(nfeatures);

            keypoints = DistributeOctTree(vToDistributeKeys, minBorderX, maxBorderX,
                                          minBorderY, maxBorderY,mnFeaturesPerLevel[level], level);

            const int scaledPatchSize = PATCH_SIZE*mvScaleFactor[level];

            // Add border to coordinates and scale information
            const int nkps = keypoints.size();
            for(int i=0; i<nkps ; i++)
            {
                keypoints[i].pt.x+=minBorderX;
                keypoints[i].pt.y+=minBorderY;
                keypoints[i].octave=level;
                keypoints[i].size = scaledPatchSize;
            }
        }

        // compute orientations
        for (int level = 0; level < nlevels; ++level)
            computeOrientation(mvImagePyramid[level], allKeypoints[level], umax);
    }
Styazoua commented 10 months ago

Here is also a part of the console output. I'm not sure if its all correct but at least some keypoints should be removed.

Point not inside mask. Value = 0
vKeysCell size: 16
Position of point: (962/339)
Removing point. Point inside mask. Value = -1
Position of point: (951/340)
Removing point. Point inside mask. Value = -1
Position of point: (944/341)
Removing point. Point inside mask. Value = -1
Position of point: (956/341)
Point not inside mask. Value = 0
vKeysCell size: 16
Position of point: (960/341)
Removing point. Point inside mask. Value = -1
Position of point: (947/342)
Point not inside mask. Value = 0
vKeysCell size: 16
Position of point: (973/344)
Point not inside mask. Value = 0
vKeysCell size: 16
Position of point: (960/350)
Point not inside mask. Value = 0
vKeysCell size: 16
Position of point: (963/361)
Styazoua commented 10 months ago

That's what I'm doing at the moment. I'm passing the mask over this argument. Inside of operator() my modified ComputeKeyPointsOctTree() function gets called. But unfortunatelly it's not working.

Cheng-Nan-Lee commented 4 months ago

Hello Styazoua,

Unsure if you have since solved the problem.

I have been working on just this issue. Using the erase method of the vector after the function "add border to coordinates and scale information" removed the keypoints. However, this is not a complete solution as removing some of the keypoints prevented ORB-SLAM3 from detecting even the image (unsure why this is happening).

I welcome any further assistance.

Styazoua commented 4 months ago

Hello @Cheng-Nan-Lee Yes I have found a solution for that. ORB_SLAM3 scales the original image in multiple resolutions to get an image pyramid. For each level of the pyramid it detects keypoints which then are stored in allKeypoints. So what I did is to iterate through that vector, get each point and multiply it by the scale factor of the corresponding level to bring it to the original image resolution.

image

image

After that I check if the point is inside of the mask or not. If it is, the keypoint gets removed from the vector. My created masks looks like this:

image It has the size of the image and where the cells do not contain value 0 keypoints will get removed if they are getting detected in this area.

    int ORBextractor::operator()( InputArray _image, const Mat &_mask, vector<KeyPoint>& _keypoints,
                                  OutputArray _descriptors, std::vector<int> &vLappingArea)
    {
        //cout << "[ORBextractor]: Max Features: " << nfeatures << endl;
        if(_image.empty())
            return -1;
        Mat image = _image.getMat();
        assert(image.type() == CV_8UC1 );
        // Pre-compute the scale pyramid
        ComputePyramid(image);
        vector < vector<KeyPoint> > allKeypoints;
        ComputeKeyPointsOctTree(allKeypoints);
        //ComputeKeyPointsOld(allKeypoints);

        for(int level = 0; level < nlevels; ++level) {
            if(!_mask.empty()) {
                vector<cv::KeyPoint> tmpkeypoints;
                for(const auto& keypoint : allKeypoints[level]) {
                    // Upscale coords to original image size
                    int scaledX = keypoint.pt.x * mvScaleFactor[level];
                    int scaledY = keypoint.pt.y * mvScaleFactor[level];

                    // Check if cell inside mask is 0 
                    int value = _mask.at<uint8_t>(scaledY, scaledX);
                    //cout << "Value in mask: " << value << endl;
                    if(value == 0) {
                        tmpkeypoints.push_back(keypoint);
                    }
                }
                allKeypoints[level] = tmpkeypoints;
            }
        }

One thing that I am not showing here is that you have to pass your masks to this function.

I have tested that with a sample mask where half is 0 and the other half 255 and thats the output:

MicrosoftTeams-image (4)

The only thing I noticed is that the masking area should not be too big. Otherwise it would lead to errors in tracking. Its like filming a white wall.

Here a scientific paper which is similar to my implementation.

Cheng-Nan-Lee commented 4 months ago

Thank you very much. I noted that when I masked out the edges (up to 75% of the image), ORB-SLAM3 ran perfectly fine. But when I masked the central 25%, ORB-SLAM3 cannot detect the image. I will investigate this further, but I'm interested in possible solutions.

Styazoua commented 4 months ago

Have you tested on different environments? Maybe the other 75% do not contain enough information. I used to mask persons inside the frames and noticed this problem only when the person was too close to the camera so that the bounding box was almost the whole image. Let me know if you find any solution.

Cheng-Nan-Lee commented 4 months ago

The cause was a logic error in my code that masked out all features. As with most of my issues, if I stare hard enough at them I'll eventually realise it stemmed from my careless approach. Is it ok to close this issue?