POSTECH-CVLab / point-transformer

This is an unofficial implementation of the Point Transformer paper.
508 stars 99 forks source link

why use input shape as "pxo"? #36

Closed moshicaixi closed 2 years ago

moshicaixi commented 2 years ago

Hi, thanks very much for your excellent codes. But I am curious about the input shape of Transformer layer, as depicted in the following picture. I am confused that there is no batch dimension, and what is o(offset)? In other papers, the point cloud shape is usually represented as (B C N) or (B N C), which is easy to understand. But I got confusion when reading your codes. Hoping to your guide!

pt

yifliu3 commented 2 years ago

Hi, I also find the problem, and finally i realized that the author tried to flattern data sequences with different lengths into one sequence. For example, if batch_size is set to 4, and three batches all load 8000 points separately, but the remaining batch only load 7000 points, and default collate_fn methods will try to get (4, 8000, 3), but this will cause error since one batch only have 7000 points, so it is better to flattern these points into (8000*3+7000, 3) and use a offset sequence to record the end of each batch.

moshicaixi commented 2 years ago

Hi, I also find the problem, and finally i realized that the author tried to flattern data sequences with different lengths into one sequence. For example, if batch_size is set to 4, and three batches all load 8000 points separately, but the remaining batch only load 7000 points, and default collate_fn methods will try to get (4, 8000, 3), but this will cause error since one batch only have 7000 points, so it is better to flattern these points into (8000*3+7000, 3) and use a offset sequence to record the end of each batch.

Hi, thanks for your reply. Has the flatten process done in data preprocessing when dataloader load the data? Maybe I need to read the codes from scratch.

yuchenlichuck commented 2 years ago

I agree, so they use offset to tell the length of each shapes's point cloud

chrockey commented 2 years ago

Hi @moshicaixi,

Sorry for the late reply 🙏 . The comment of @yifliu3 is exactly right. Using offsets makes it possible to construct a mini-batch of point clouds whose cardinalities are not the same. By the way, the nearest neighbor search on those offset-informed mini-batches is implemented by the first author of Point Transformer.

Hope this helps your understanding 😄 .

overnap commented 3 months ago

Note that the offset is the boundary, not the base. For the above example with (8000, 8000, 8000, 7000) points, set o = [8000, 16000, 24000, 31000] If you input base offset (e.g. [0, 8000, ...]) you will get an illegal memory access CUDA error or something in pointops_cuda.