Closed lukes3315 closed 3 years ago
Nevermind, I think it is more of system level problem and conflict with other things using the GPU than a problem on the cuda-bundle-adjustment side.
I will post a fix when I find one.
Thanks, Cheers.
Nevermind, I think it is more of system level problem and conflict with other things using the GPU than a problem on the cuda-bundle-adjustment side.
I will post a fix when I find one.
Thanks, Cheers.
Hi @lukes3315 , I have met the exactly the same problem, is there any solutions updated? thanks.
CUDA 9.2
Ubuntu 16.04
sorry, I have figured out. It is a OOM problem
on my machine(GTX 1070 Ti), the sequence 07 demo is OK, while the sequence 00 is not.
I get the same errors. Is the library not suited for such cases where there aren't any stereo edges? Too bad you've closed this
Hi,
I could reproduce that error in the following environment.
The error occurred in thrust::sort
in findHschureMulBlockIndices
.
void findHschureMulBlockIndices(const GpuHplBlockMat& Hpl, const GpuHscBlockMat& Hsc,
GpuVec3i& mulBlockIds)
{
const int block = 1024;
const int grid = divUp(Hpl.cols(), block);
DeviceBuffer<int> nindices(1);
nindices.fillZero();
findHschureMulBlockIndicesKernel<<<grid, block>>>(Hpl.cols(), Hpl.outerIndices(), Hpl.innerIndices(),
Hsc.outerIndices(), Hsc.innerIndices(), mulBlockIds, nindices);
CUDA_CHECK(cudaGetLastError());
std::cout << "sort size: " << mulBlockIds.size() << std::endl;
auto ptrSrc = thrust::device_pointer_cast(mulBlockIds.data());
try
{
thrust::sort(ptrSrc, ptrSrc + mulBlockIds.size(), LessRowId());
}
catch (const thrust::system_error& e)
{
std::cout << e.what() << std::endl;
}
}
When I run above code, then I get the error.
sort size: 294114
merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
I wonder why this happens in monocular only case. In "with stereo" case, the sort size is 2308093 and much larger than that in monocular only case, but the error didn't occur.
I will continue to investigate.
Hi,
I finally found the cause.
The problem is not thrust::sort
but the preceding DeviceBuffer
initialization.
DeviceBuffer<int> nindices(1);
nindices.fillZero();
There was a bug of uninitialized value.
In the above code, nindices
is allocated memory but size was not set and still zero, so nindices.fillZero()
had no effect on it.
The bug was fixed in https://github.com/fixstars/cuda-bundle-adjustment/commit/29d4b87e5120afa9b429a4e8b119113f1f4e555a, and the error don't occur. Thank you for your cooperation.
Regards,
@atakagi-fixstars
I have encountered another bug. In KITTI demo, when I set all landmark position fixed, I got a memory error:
i.e. change this line of code here:
// auto v = obj.create<cuba::LandmarkVertex>(id, Xw, fixed);
auto v = obj.create<cuba::LandmarkVertex>(id, Xw, 1);
Reading Graph... terminate called after throwing an instance of 'thrust::system::system_error'
what(): scan failed to get memory buffer: invalid configuration argument
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
Hi, @carpdm
Thank you for notifying. We need to manage the issue in #7 .
Regards,
Hi,
Great work on this GPU based pose graph optimization.
I have been encountering issues with a pose graph constructed with monocular observations only and an error has been occurring:
I reproduced consistently when building my pose graph without stereo edges.
To reproduce it with the given examples I just commented out the stereo edge loading in the
readGraph
method, which causes the aforementioned error.Interestingly the crash isn't consistent, the memory error will only occur every n runs:
Any idea if I am doing anything wrong or if the cuda bundle adjustment was not designed for having a non stereo edge based pose graph?
Thanks, Cheers.