Julie-tang00 / Point-BERT

[CVPR 2022] Pre-Training 3D Point Cloud Transformers with Masked Point Modeling
MIT License
541 stars 65 forks source link

Confusion about 1024 points, but there are 64 patches and each one contains 32 points #34

Closed auniquesun closed 2 years ago

auniquesun commented 2 years ago

@yuxumin @lulutang0608 Thanks for sharing the paper and code.

One point I am confused with is, in section 4.1 Pre-training setups, you claim "We sample 1024 points from each 3D model and divide them into 64 point patches (sub- clouds). Each sub-cloud contains 32 points".

It is confusing in 3 ways:

  1. 64 * 32 > 1024, which deviates the description in the paper, so I have a second understanding as follows
  2. For 1024 points in 64 patches, there are overlapped points across different patches, if not, I have a third understanding as follows
  3. In ShapeNet, each object is a CAD model that can be sampled to produce a holistic point set P, which contains number of points ( >1024 at least). Then FPS is applied to P to get 1024 points, denoted as Q. Among Q, you apply FPS again to get 64 point centers, denoted as C. After that, in the search space P, you search k(= 32) nearest neighbors for each center point in C.

Am I right? If the third way is right, why not FPS 64 centers in P directly, then find its k nearest neighbors to produce 64 local patches?

yuxumin commented 2 years ago

Hi, thank you for your interest in our paper. There are overlaps between the 64 patches. Point clouds are different from images, which can be easily split into patches with no overlapping and missing parts. We allow the existence of overlaps to guarantee all points are included in patches. As for the third understanding, it requires more points as input, which seems unfair compared to other methods.

auniquesun commented 2 years ago

I got it, thank you.