Colin97 / OpenShape_code

official code of “OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding”
https://colin97.github.io/OpenShape/
Apache License 2.0
236 stars 16 forks source link

Issues about channel of xyz in example.py #7

Closed Holmes-GU closed 1 year ago

Holmes-GU commented 1 year ago

Hi, thanks for your codes. I have tried to run python src/examples. However, it returns 'RuntimeError: Given groups=1, weight of size [64, 9, 1, 1], expected input[1, 10, 64, 384] to have 9 channels, but got 10 channels instead', deriving from self.mlp of function 'PointNetSetAbstraction' in modes/pointnet_util.py. The source maybe lie in the channel of xyz (4 in example.py). Is there any solutions?

Thank you very much.

eliphatfs commented 1 year ago

Hi. I am a little confused why you are seeing 4 channels for xyz. image Mine has 3.

Holmes-GU commented 1 year ago

Hi. I am a little confused why you are seeing 4 channels for xyz. image Mine has 3.

Hi, I follow 'python3 src/example.py' for quick start. In example.py, the xyz is processed by ME.utils.batched_coordinates(), therefore, the channel becomes 4, as shown below. Besides, 'demo/pc.ply' does not exist and it should be 'demo/owl.ply' instead. And in the training file, this function seems not to be used.

image image
Colin97 commented 1 year ago

Hi, which checkpoint are you using? The example.py is for spconv. If you are using a pointbert checkpoint, some modifications are needed. Sorry for the confusion.

Holmes-GU commented 1 year ago

Hi, which checkpoint are you using? The example.py is for spconv. If you are using a pointbert checkpoint, some modifications are needed. Sorry for the confusion.

I have changed the backbone to pointbert with model.scaling=4, model.name=PointBert and model.use_dense=True. Would you like to provide me with exact modifications? Thanks.

image
Colin97 commented 1 year ago

Hi, please refer to the codes:

https://github.com/Colin97/OpenShape_code/blob/26cf8d16551368f8f1e8e3801cbfb629b6157a03/src/train.py#L101C18-L101C18

https://github.com/Colin97/OpenShape_code/blob/26cf8d16551368f8f1e8e3801cbfb629b6157a03/src/data.py#L238C10-L238C19

Basically, to use PointBert, you don't need to process the PC with MinkowskiEngine.

Holmes-GU commented 1 year ago

Hi, please refer to the codes:

https://github.com/Colin97/OpenShape_code/blob/26cf8d16551368f8f1e8e3801cbfb629b6157a03/src/train.py#L101C18-L101C18

https://github.com/Colin97/OpenShape_code/blob/26cf8d16551368f8f1e8e3801cbfb629b6157a03/src/data.py#L238C10-L238C19

Basically, to use PointBert, you don't need to process the PC with MinkowskiEngine.

Ok. Thanks for your instructions. I will try it tomorrow~

Holmes-GU commented 1 year ago

Hi. Following your instructions, I successfully run the codes.

Sorry for bothering you again. I observe some differences in processing data in training and testing as follows:

  1. In training, the rgb is multiplied by a factor 0.4 in 'if use_color' from 'class Four'. Why not apply it in testing?
  2. In class ModelNet40Test, 'rgb = rgb / 255.0' is applied before 'if use_color' but not applied and observed in 'class ObjaverseLVIS' and 'class ScanObjectNNTest'.
Colin97 commented 1 year ago
  1. Do you mean this line? This is an augmentation, which randomly change the colors of some shapes to a constant (0.4).

  2. This is due to some inconsistency when preparing the data files. RGB in ObjaverseLVIS and ScanObjectNN are in [0,1]. ModelNet40 don't have colors and we put 100 for all data files. 'rgb = rgb / 255.0' is just to normalize them to 0.4.

Holmes-GU commented 1 year ago

OK. What about this part? Does ScanObjectNNTest not own colors and directly set it as a constant (0.4)?

image
Colin97 commented 1 year ago

Yes.

Holmes-GU commented 1 year ago

Yes.

OK, thank you very much.

Holmes-GU commented 1 year ago

哈喽,再问一下哦,这个地方为啥只取x[:,0],而不是x呢?

image
Holmes-GU commented 1 year ago

哈喽,再问一下哦,这个地方为啥只取x[:,0],而不是x呢? 这个x[:,0]是class token吧,然后剩下的384是聚合以后点的个数吧?

image
eliphatfs commented 1 year ago

Because we pick the first token, which is the CLS token, as the pooler output, as in any transformer encoder architecture (including but not limited to BERT, CLIP ViT, PointBERT).