Possibly insufficient preallocation in IoU3D algorithm implementation

Hi guys,

I used your algorithm for the calculation of the 3D IoU to implement 3D NMS on CUDA. Whilst experimenting, I occasionally received Invalid __local__ write of size 16 bytes errors which I then tracked down to this line: https://github.com/facebookresearch/pytorch3d/blob/fe0b1bae49e7144021a9eb63169e855f51dd4dd3/pytorch3d/csrc/iou_box3d/iou_utils.cuh#L733 by using the compute-sanitizer. I initially thought the issue was within my modification but a quick breakdown reveals that the index may indeed exceed the limit of MAX_TRIS=100 (see the snippet at the bottom where I just assumed that ClipTriByPlane() -> 2).

I just wanted to let you know, even though it appears that the limit seems sufficient in most practical usecases.

Kind regards Enrico

n_max = 0
num_tris = 12
for p in range(6):
    offset = 0
    for t in range(num_tris):
        count = 2
        for v in range(count):
            offset += 1
    num_tris = offset
    for j in range(num_tris):
        n_max = max(n_max, j)

print(n_max)  # 767

facebookresearch / pytorch3d

Possibly insufficient preallocation in IoU3D algorithm implementation #1777