Shape mismatch error (Line 499, voxel_encoder.py)

rahuja123 commented 2 years ago

Hi, I am just trying to run the training code for scene flow using the command: python ./tools/train.py ./configs/pointpillars/hv_pointpillars_fpn_sbn-all_4x8_2x_nus-3d_flow.py --work-dir ${WORK-DIR} --gpu-ids 0

But I get shape mismatch error at line 499 of voxel_encoder.py

Is there a bug in the code, or am I making a mistake somewhere?

MingyuLiu1 commented 2 years ago

Hi thanks for your question. Did you change any other settings, like samples_per_gpu in config file configs/_base_/datasets/nus-3d_flow.py. Because currently our model can only support samples_per_gpu = 2 for flow training, if you changed it, please reset it to 2. If it still not work, please let us know.

rahuja123 commented 2 years ago

Thanks for the quick reply.

No, I didn't change anything in the config, but I guess the main problem comes here:

Here the expected shape for first_frame_pillar_features is (f1,64,64) but when I print its shape, it shows as torch.Size([22351, 64]). The voxel_feats shape itself is 'torch.Size([45740, 64])', how is it expected to be (f1,64,64)?

Whereas the shape for 'first_frame_point_coors' ( (f1*64,10)) which is torch.Size([1430464, 10]).

So, while concatenating they give shape mismatch error.

MingyuLiu1 commented 2 years ago

Oki, then I will check the codes again to see if it is correct. If there is an issue, I will fix it and let you know.

rahuja123 commented 2 years ago

I think I found the bug. In one of your files, _voxel_encoderflow.py you have set max_out=False but it is set to True in _voxelencoder.py which is eventually being called. Setting max_out=False at line 380 in _voxelencoder.py makes it run fine.

MingyuLiu1 commented 2 years ago

Thanks a lot. Or you can also change the voxel_encoder_flow.py to voxel_encoder.py when you do the flow training.

rahuja123 commented 2 years ago

Alright!

Also I wanted to ask why is only a batch size of 2 is used and for only single_gpu? Is it possible to speed it up?

MingyuLiu1 commented 2 years ago

Since the memory of our GPU did not support a bigger batch size, when we developed our model, we could not try more. So till now, the flow code can only be trained based on batch_size = 2. But we will develop it for a larger batch size. Because we believe that a larger batch size will make more sense.

rahuja123 commented 2 years ago

Thank you! A larger batch size would definitely be great.

Plus, do you also support any inference for scene flow estimation?

MingyuLiu1 commented 2 years ago

We are arranging the evaluation files and will upload them soon. Before doing the evaluation, we suggest that you can use the flow checkpoint from epoch 4 or 5 to do the detection task. Normally after 7 epochs, there will be overfitting.

rahuja123 commented 2 years ago

Alright! But I wanted to check how well flow is working. I ran it on nuscenes, and it didn't seem to work that well. :/ I wanted to clarify one thing:

This is the visualisation of poitncloud1 (red) and pointcloud2(blue) overlayed together. These pointclouds are successive frames that go inside flownet head before being sampled to 2048 points. I wanted to clarify, shouldn't these 2 point clouds overlaps and align with each other?

The final scene flow mapping comes out to be:

which is wrong.

Can you help me figure out if I am wrong or if the pointclouds should be aligned?

MingyuLiu1 commented 2 years ago

Sorry for the late reply. Your figure showed that there was a small shift between the first and second frames, but it was not very huge. It could be caused by the ego movement of the car, since there was a small-time slot between the two sequential frames, during that, the ego-vehicle could move front a bit and even has a small direction shift.

rahuja123 commented 2 years ago

Yeah, but like this is only one example. The time difference between some frames was huge and there were some transformation between pointcloud t and pointcloud t-1. See the image below for another input pair:

rahuja123 commented 2 years ago

Moreover, you mention in the paper that you load FlowNet3d pretrained weights for scene flow. Where is the loading happening? Like at which line?

MingyuLiu1 commented 2 years ago

Yeah, but like this is only one example. The time difference between some frames was huge and there were some transformation between pointcloud t and pointcloud t-1. See the image below for another input pair:

Since the original mmdetection3d used shuffle when loading the dataset, but we need two sequential frames as input for our scene flow training. We have tried to close the shuffle, but it will cause serious overfitting. So we tried another way, and we rewrote the load function. Specifically, when we load the dataset, first we will get all the frames with even number id, for example, 0, 2, 4 ..., then we combine the nearest odd id frames (1, 3, 5...) with them, so that we could have several pairs of frames ((0, 1), (2, 3), (4, 5)...). Then based on these frame pairs, we do the shuffle. This means we can use two sequential frames as input, but in general, all the input remains random.

But this approach will cause a problem since not all pairs of frames are from the same scene (we only use their ids to generate the frame pairs, but don't care about if they are from the same scene). This is the reason why there were some frames over each other or looked like they were not from the same scene.

I hope it can solve your question.

MingyuLiu1 commented 2 years ago

Moreover, you mention in the paper that you load FlowNet3d pretrained weights for scene flow. Where is the loading happening? Like at which line?

Sorry for that, this is an unclear stuff in our paper, we will rewrite this part. Only for Point-GNN scene flow training we used the pretrained weights, for all the scene flow training on nuScenes, we did not do that.

rahuja123 commented 2 years ago

Moreover, you mention in the paper that you load FlowNet3d pretrained weights for scene flow. Where is the loading happening? Like at which line?

Sorry for that, this is an unclear stuff in our paper, we will rewrite this part. Only for Point-GNN scene flow training we used the pretrained weights, for all the scene flow training on nuScenes, we did not do that.

Oh! Is there any particular reason that is done? Like why it isn't used for scene flow training on nuScenes?

MingyuLiu1 commented 2 years ago

Moreover, you mention in the paper that you load FlowNet3d pretrained weights for scene flow. Where is the loading happening? Like at which line?

Sorry for that, this is an unclear stuff in our paper, we will rewrite this part. Only for Point-GNN scene flow training we used the pretrained weights, for all the scene flow training on nuScenes, we did not do that.

Oh! Is there any particular reason that is done? Like why it isn't used for scene flow training on nuScenes?

This is our next work :D

MingyuLiu1 commented 2 years ago

Moreover, you mention in the paper that you load FlowNet3d pretrained weights for scene flow. Where is the loading happening? Like at which line?

Sorry for that, this is an unclear stuff in our paper, we will rewrite this part. Only for Point-GNN scene flow training we used the pretrained weights, for all the scene flow training on nuScenes, we did not do that.

Oh! Is there any particular reason that is done? Like why it isn't used for scene flow training on nuScenes?

This is our next work :D

We tried to use the pertained weights on FlowNet3D for the scene flow training on nuScenes, but it seems that it did not work well, so the next step should like doing more experiments and tune the hyper-parameters to find the reason.

richardkxu commented 1 year ago

I think I found the bug. In one of your files, _voxel_encoderflow.py you have set max_out=False but it is set to True in _voxelencoder.py which is eventually being called. Setting max_out=False at line 380 in _voxelencoder.py makes it run fine.

Hi there, I am facing the same shape mismatch error when training ssl-pointpillar on nus. I tried setting max_out=False at line 380 in _voxelencoder.py as mentioned above and it resolved the error. However, this fix causes another error when testing ssl-pointpillar on nus:

Exception has occurred: RuntimeError
t() expects a tensor with <= 2 dimensions, but self is 3D
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/mmdet3d/models/middle_encoders/pillar_scatter.py", line 86, in forward_batch
    voxels = voxels.t()
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/mmdet3d/models/middle_encoders/pillar_scatter.py", line 33, in forward
    return self.forward_batch(voxel_features, coors, batch_size)
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/mmdet3d/models/detectors/mvx_two_stage.py", line 224, in extract_pts_feat
    x = self.pts_middle_encoder(voxel_features, coors, batch_size)
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/mmdet3d/models/detectors/mvx_two_stage.py", line 238, in extract_feat
    pts_feats = self.extract_pts_feat(points, img_feats, img_metas)
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/mmdet3d/models/detectors/mvx_two_stage.py", line 451, in simple_test
    points, img=img, img_metas=img_metas)
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/mmdet3d/models/detectors/base.py", line 41, in forward_test
    return self.simple_test(points[0], img_metas[0], img[0], **kwargs)
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/mmdet3d/models/detectors/base.py", line 60, in forward
    return self.forward_test(**kwargs)
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/mmdet3d/apis/test.py", line 37, in single_gpu_test
    result = model(return_loss=False, rescale=True, **data)
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/tools/test.py", line 184, in main
    outputs = single_gpu_test(model, data_loader, args.show, args.show_dir)
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/tools/test.py", line 214, in <module>
    main()
RuntimeError: t() expects a tensor with <= 2 dimensions, but self is 3D

You have also mentioned another fix:

Thanks a lot. Or you can also change the voxel_encoder_flow.py to voxel_encoder.py when you do the flow training.

Where to change exactly? Will this also cause the RuntimeError: t()... when testing ssl-pointpillar on nus?

Thank you!

MingyuLiu1 commented 1 year ago

I think I found the bug. In one of your files, _voxel_encoderflow.py you have set max_out=False but it is set to True in _voxelencoder.py which is eventually being called. Setting max_out=False at line 380 in _voxelencoder.py makes it run fine.

Hi there, I am facing the same shape mismatch error when training ssl-pointpillar on nus. I tried setting max_out=False at line 380 in _voxelencoder.py as mentioned above and it resolved the error. However, this fix causes another error when testing ssl-pointpillar on nus:

Exception has occurred: RuntimeError
t() expects a tensor with <= 2 dimensions, but self is 3D
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/mmdet3d/models/middle_encoders/pillar_scatter.py", line 86, in forward_batch
    voxels = voxels.t()
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/mmdet3d/models/middle_encoders/pillar_scatter.py", line 33, in forward
    return self.forward_batch(voxel_features, coors, batch_size)
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/mmdet3d/models/detectors/mvx_two_stage.py", line 224, in extract_pts_feat
    x = self.pts_middle_encoder(voxel_features, coors, batch_size)
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/mmdet3d/models/detectors/mvx_two_stage.py", line 238, in extract_feat
    pts_feats = self.extract_pts_feat(points, img_feats, img_metas)
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/mmdet3d/models/detectors/mvx_two_stage.py", line 451, in simple_test
    points, img=img, img_metas=img_metas)
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/mmdet3d/models/detectors/base.py", line 41, in forward_test
    return self.simple_test(points[0], img_metas[0], img[0], **kwargs)
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/mmdet3d/models/detectors/base.py", line 60, in forward
    return self.forward_test(**kwargs)
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/mmdet3d/apis/test.py", line 37, in single_gpu_test
    result = model(return_loss=False, rescale=True, **data)
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/tools/test.py", line 184, in main
    outputs = single_gpu_test(model, data_loader, args.show, args.show_dir)
  File "/home/richardkxu/Documents/ssl-3d-detection/mmdetection3d/tools/test.py", line 214, in <module>
    main()
RuntimeError: t() expects a tensor with <= 2 dimensions, but self is 3D

You have also mentioned another fix:

Thanks a lot. Or you can also change the voxel_encoder_flow.py to voxel_encoder.py when you do the flow training.

Where to change exactly? Will this also cause the RuntimeError: t()... when testing ssl-pointpillar on nus?

Thank you!

Hi, You can use the voxel_encoder_flow.py instead of the original voxel_encoder.py. Such as, rename voxel_encoder.py to 'voxel_encoder_orginal.py', and rename 'voxel_encoder_flow.py' as voxel_encoder.py.

richardkxu commented 1 year ago

Hi, renaming has the same effect as setting max_out=False. It fixes the shape mismatch error during training but introduces the same RuntimeError: t() expects a tensor with <= 2 dimensions, but self is 3D error at this line. The tensor voxels has shape [6070, 64, 64], so .t() does not apply. Also, canvas[:, indices] = voxels at the next line seems to be problematic. You cannot set a tensor of size [64, 6070] to a tensor of size [6070, 64, 64].

cracker-Li commented 1 year ago

I find a method which can work on both training and testing in pointpillars, but not sure whether it is correct. set max_out=False when training flow set max_out=True when training det and testing det

In this way, 'RuntimeError: t() expects a tensor with <= 2 dimensions, but self is 3D' is avoided.

emecercelik / ssl-3d-detection

Shape mismatch error (Line 499, voxel_encoder.py) #1