Multiple camera fusion - Githubissues

Sure, so this is technically feasible, but will require a lot of compute for CaDDN in its current form. You will likely have to reduce your resolutions of frustum/voxel grids in order to fit this on a GPU.

You will need to run the frustum feature network for each of the camera views to generate independent frustum features for each view. You will likely want separate networks for each view, but you can try using shared weights for the frustum feature networks at first.

There will only be one voxel grid, and for each voxel you will project its center into all frustum views to extract the relevant frustum features. In most cases, each voxel will only project into one frustum view (with the FOV of the camera), so you can just simple extract that feature. In the case that a voxel projects into two different views, you can just average the two features to populate the voxel feature.

Once you've constructed the voxel grid, the collapse to the BEV grid is unchanged.

TRAILab / CaDDN

Multiple camera fusion #78