Closed Yzichen closed 1 year ago
In addition, this implementation only utilises the temporal information from the same camera to compose stereo vision, and does not utilise the stereo information across cameras. Do I understand you correctly?
@Yzichen Just a private implementation~ in my ablation L1 norm is better than dot product
@Yzichen yes, I haven't used the stereo information across cameras as it doesn't contribute to the performance. maybe my implementation in this direction has some mistakes. you can try it on your own.
@HuangJunJie2017 Thank you for your prompt reply and for your work.
BEVStereo uses the difference sum of the corresponding features to represent the matching cost when building the cost volume. Is this better than using the dot product between features? Besides, Which paper did you reference for your implementation?