PJLab-ADG / OpenPCSeg

OpenPCSeg: Open Source Point Cloud Segmentation Toolbox and Benchmark
366 stars 36 forks source link

AttributeError: 'Tensor' object has no attribute 'feats' #16

Closed kennethleng closed 10 months ago

kennethleng commented 1 year ago

I run the train.py with the command line: python train.py --cfg_file tools/cfgs/fusion/semantic_kitti/rpvnet_mk18_cr10.yaml and some wrong info occurs. Anyone else has account the same situation File "/home//anaconda3/envs/pcseg/lib/python3.7/site-packages/torchsparse/nn/utils/apply.py", line 12, in fapply feats = fn(input.feats, args, **kwargs) AttributeError: 'Tensor' object has no attribute 'feats'

hahakid commented 10 months ago

i faced the same issue, rpvnet.py may rise this problem, I found that BatchNorm() input should be SparseTensor, however, when some times it send only a Tensor to fapply(). As I tested on semantickitti data, no matter what, the 3rd time the train.py call fapply(), it passed a Tensor instead of SparseTensor.

Before you closed this issue, can you offer a solution? Reinstall SparseTensor does not solve this problem. Debug is very hard for this function of BatchNorm, plz.

I think the problem might be rpvnet.py forward() #there are many forward(), i am talking about the one belong to RPVNet class.

when calculating the z0 in line 668 (the line # may not correct, as I added annotations in my local envs), z0.F is Tensor, r_z0 is Tensor, so z0_point should also be a Tensor.

However, the self_point_transforms0 called the nn.utils.apply.py and tries to return a SparseTensor().

I am not sure the logic of the function calling currently, do anyone have any clues? I also dont find a function for achieve this purpose? Should I use the original pytorch instead of torchspares?

hahakid commented 10 months ago

i faced the same issue, rpvnet.py may rise this problem, I found that BatchNorm() input should be SparseTensor, however, when some times it send only a Tensor to fapply(). As I tested on semantickitti data, no matter what, the 3rd time the train.py call fapply(), it passed a Tensor instead of SparseTensor.

Before you closed this issue, can you offer a solution? Reinstall SparseTensor does not solve this problem. Debug is very hard for this function of BatchNorm, plz.

I think the problem might be rpvnet.py forward() #there are many forward(), i am talking about the one belong to RPVNet class.

when calculating the z0 in line 668 (the line # may not correct, as I added annotations in my local envs), z0.F is Tensor, r_z0 is Tensor, so z0_point should also be a Tensor.

However, the self_point_transforms0 called the nn.utils.apply.py and tries to return a SparseTensor().

I am not sure the logic of the function calling currently, do anyone have any clues? I also dont find a function for achieve this purpose? Should I use the original pytorch instead of torchspares?

I think this problem is that the author does not train on single GPU, so they do not trigger batchnorm, instead of syncBatchNorm.

So you should add a forward function with Tensor as input in class BatchNorm of rpvnet.py. good luck.

My test isnt finished, i may report later. However, i can start training now.