Closed huixiancheng closed 3 years ago
Hi, thank you for the interest.
1...
the training sampler is in the end probably the same as torch.utils.data.distributed.DistributedSampler except you don't need to call .set_epoch()
2...
I think the OHEM loss was used from the start as it is standard in the cityscapes pipelines (see HRNet) so not many experiments were done with that. For KNN vs KPConv the extended abstract did include a comparison.
3...
Augmentation has proven difficult. I think the pipeline as it is does HFlip. We did some experiments with random cropping at different scales but this didn't show any improvement. My I guess is because the vertical resolution of the scans is too low? This might be a downside of the projection based methods in general - for the 3D methods scale is not an issue and in image domain the multiscale problem was solved by doing train and test scale augmentation but in the case of lidar projections it seems hard.
Thank you very much for your quick reply!!:sparkles: Yes,in your paper have shown that “replace the KNN postprocessing with a KPConv layer which results in 2.4 mIoU higher“ It's my mistake.:see_no_evil: I think it's useful since in Real-time Network KNN module will do harm to the performance.
Moreover,actually before read your paper, I have read the paper ”Scan-based Semantic Segmentation of LiDAR Point Clouds: An Experimental Study“ .The method of Scan Unfolding is also awesome.However,as shown in that paper.It's look like do not perfect fit to SemanticKitti Dataset and need to download KITTI Raw dataset. So in the end I gave up this method.The repo is in https://github.com/risteon/kitti_scan_unfolding
In issue https://github.com/DeyvidKochanov-TomTom/kprnet/issues/8#issuecomment-762164303. It’s look like you have approximate solve the problem! The result of code also show that.Brilliant! :clap: In the code,you use [65,2049] as [H,W]. In scan base paper,it's [64,2000]. In RangNet++,it's [64,2048] or [64,1024]. As far as I know,H is 64 Rays of KITTI HDL-64 sensor.W is influence by angular resolution. Is there any particular impact in your setting?
Hi, the size doesn't matter for the projection so much as long as it is big enough to fit all the projected points. The odd sizes is just sort of... tradition :smile: See here. I guess people do it because it prevents activations from 'drifting sideways'. Here is a snippet:
import torch
import torch.nn.functional as F
def s(t):
down = torch.nn.Conv2d(1, 1, kernel_size=3, stride=2, padding=1, bias=False)
torch.nn.init.constant_(down.weight, 1)
down_t = down(t)
up_down_t = F.interpolate(down_t, size=t.shape[-2:], mode="bilinear", align_corners=False)
print(up_down_t)
If you run with even size you see the drift
In [2]: s(torch.ones(1, 1, 8, 8))
tensor([[[[4.0000, 4.5000, 5.5000, 6.0000, 6.0000, 6.0000, 6.0000, 6.0000],
[4.5000, 5.0625, 6.1875, 6.7500, 6.7500, 6.7500, 6.7500, 6.7500],
[5.5000, 6.1875, 7.5625, 8.2500, 8.2500, 8.2500, 8.2500, 8.2500],
[6.0000, 6.7500, 8.2500, 9.0000, 9.0000, 9.0000, 9.0000, 9.0000],
[6.0000, 6.7500, 8.2500, 9.0000, 9.0000, 9.0000, 9.0000, 9.0000],
[6.0000, 6.7500, 8.2500, 9.0000, 9.0000, 9.0000, 9.0000, 9.0000],
[6.0000, 6.7500, 8.2500, 9.0000, 9.0000, 9.0000, 9.0000, 9.0000],
[6.0000, 6.7500, 8.2500, 9.0000, 9.0000, 9.0000, 9.0000, 9.0000]]]],
And the odd size magically prevents it from happening:
In [3]: s(torch.ones(1, 1, 9, 9))
...:
tensor([[[[4.0000, 4.6667, 5.7778, 6.0000, 6.0000, 6.0000, 5.7778, 4.6667, 4.0000],
[4.6667, 5.4444, 6.7407, 7.0000, 7.0000, 7.0000, 6.7407, 5.4444, 4.6667],
[5.7778, 6.7407, 8.3457, 8.6667, 8.6667, 8.6667, 8.3457, 6.7407, 5.7778],
[6.0000, 7.0000, 8.6667, 9.0000, 9.0000, 9.0000, 8.6667, 7.0000, 6.0000],
[6.0000, 7.0000, 8.6667, 9.0000, 9.0000, 9.0000, 8.6667, 7.0000, 6.0000],
[6.0000, 7.0000, 8.6667, 9.0000, 9.0000, 9.0000, 8.6667, 7.0000, 6.0000],
[5.7778, 6.7407, 8.3457, 8.6667, 8.6667, 8.6667, 8.3457, 6.7407, 5.7778],
[4.6667, 5.4444, 6.7407, 7.0000, 7.0000, 7.0000, 6.7407, 5.4444, 4.6667],
[4.0000, 4.6667, 5.7778, 6.0000, 6.0000, 6.0000, 5.7778, 4.6667, 4.0000]]]]
Wow!Thank you for such a detailed explanation !
Nice work and thanks for the open soucre code.:thumbsup: I have some ques need for help: 1 What is the meaning of TrainingSampler in Dataloader?Could you give me a perceptual introduction?Is it better than shuffle Dataloader in training? 2 Have you do ablation experiment between Ohem and simple CrossEntropyLoss,KNN model and KPConv model? 3 Could you give me some advice in Data Augmentation?