fixstars / cuda-bundle-adjustment

A CUDA implementation of Bundle Adjustment
Apache License 2.0
373 stars 46 forks source link

schur complement CUDA global memory random access #13

Open lanqing30 opened 2 years ago

lanqing30 commented 2 years ago

Hi, recently I do some CUDA acceleration of the schur complement in ceres, which compute Hsc matrix from Jacobian matrix. and I encounter some problem about bad performance of CUDA global memory random access in Hsc matrix. I read the procedure in this project which perform sparse-sparse matrix multiply part, (something like H_lp'Hpl in this code). It seems that you pre-calculate some [i, j, k] triplets, which is the addresses of matrix mutiply operations, sort them. and perform small matrix multiplication via cuda kernel function. I was wondering, is this method also suffer from the bad performance of Hsc matrix memory access? and How to tackle it?