dorian3d / DBoW2

Enhanced hierarchical bag-of-word library for C++
Other
850 stars 365 forks source link

Too slow to load a larger vocabulary #65

Open Dylancer1998 opened 3 years ago

Dylancer1998 commented 3 years ago

It took me an hour to load a 50M vocabulary. Is there a better way?

Ceopee commented 3 years ago

If you are still trying to load in txt file, I recommend to use binary files to save & load. Grab this pull request. #64

ghost commented 3 years ago

binary file is still slow...

IronSublimate commented 2 years ago

I also meet this issue when I read a 200M yaml file. In OpenCV, cv::FileNode::operator and cv::FileNodeIterator::operator+=(int) has O(n) time complexity. So I changed fn[i] to iterator.The total time complexity is changed from O(n^2) to O(n). The Pull Request is here: https://github.com/dorian3d/DBoW2/pull/68
The OpenCV FileNode source code can find here. In OpenCV4, the FileNode::operator calls FileNodeIterator::operator+=(int):

//opencv/modules/core/src/persistence.cpp
FileNode FileNode::operator[](int i) const
{
    if(!fs)
        return FileNode();

    CV_Assert( isSeq() );

    int sz = (int)size();
    CV_Assert( 0 <= i && i < sz );

    FileNodeIterator it = begin();
    it += i; //Here OpenCV use operator+=

    return *it;
}

but the FileNodeIterator::operator+=(int) is not O(1) .It uses FOR to get i.

//opencv/modules/core/src/persistence.cpp
FileNodeIterator& FileNodeIterator::operator += (int _ofs)
{
    CV_Assert( _ofs >= 0 );
    for( ; _ofs > 0; _ofs-- )
        this->operator ++();
    return *this;
}
macTracyHuang commented 1 year ago

In my case running ORBSLAM on ios , the bottleneck in loading voc is the initialization of CV:Mat in loop, I optimize it via allocating memory in a go outside the loop

ffff349 commented 3 months ago

在我在 ios 上运行 ORBSLAM 的情况下,加载 voc 的瓶颈是循环中 CV:Mat 的初始化,我通过在循环外分配内存来优化它

你好,安装你的方法,会导致kitti00的第 500 帧后,一直和第 0 帧发生回环。 int nn = 0; int nMaxNumDes = 1082174; // 总的描述符数量 vector vDes(nMaxNumDes, cv::Mat(1, FORB::L, CV_8U)); // 一次性分配内存 cv::Mat FORB::fromString(cv::Mat &a, const std::string &s) { std::cout<<nn<<std::endl; a.create(1, FORB::L, CV_8U); auto b = vDes[nn++]; unsigned char p = a.ptr(); unsigned char q = b.ptr();

stringstream ss(s); for(int i = 0; i < FORB::L; ++i, ++p, ++q) { int n; ss >> n;

if(!ss.fail())
{
  *p = (unsigned char)n;
  *q = (unsigned char)n;
}

} // 检查每个元素是否相同 cv::Mat diff; cv::absdiff(a, b, diff); if(cv::countNonZero(diff)!=0){ std::cout<<"false"<<std::endl; } return b; } 我比较了两种方式的数据,发现是没问题的。