Pointcept / PointTransformerV3

[CVPR'24 Oral] Official repository of Point Transformer V3 (PTv3)
MIT License
583 stars 30 forks source link

Results on SemanticKITTI #32

Open Javion11 opened 2 months ago

Javion11 commented 2 months ago

Thanks for your work! I have reproducted ptv3 with your codebase, but the mIoU on SemanticKITTI only reach about 65. The config is the same as NuScenes, which is shown in your paper! I would like to ask what is the reason for such a big performance gap (70.8 in the ptv3 paper)?

Gofinge commented 2 months ago

Hi, we do not use any trick for validation results. Also, please don't involve these tricks in validation benchmarks.

Sylva-Lin commented 2 months ago

Hi, we do not use any trick for validation results. Also, please don't involve these tricks in validation benchmarks.

Hi, I also met this problem. Could you share your config about SemanticKITTI? 屏幕截图 2024-04-24 100118

Gofinge commented 2 months ago

Hi, I have a response here (https://github.com/Pointcept/Pointcept/issues/205). Another thing is I have been restricted in computing resources recently, so I have to wait until some resources are available. But it won't take a long time; I should have some time and free GPU within one month.

Sylva-Lin commented 2 months ago

Hi, I have a response here (Pointcept/Pointcept#205). Another thing is I have been restricted in computing resources recently, so I have to wait until some resources are available. But it won't take a long time; I should have some time and free GPU within one month.

ok, thank you for your reply, I will try again by changing the config.

Javion11 commented 2 months ago

Hi, I have a response here (Pointcept/Pointcept#205). Another thing is I have been restricted in computing resources recently, so I have to wait until some resources are available. But it won't take a long time; I should have some time and free GPU within one month.

Thank you for your kind reply! Anyway, the work is excellent and inspiring. I'm looking forward to the release of the config that adapts to the SemanticKITTI data, because I'm very curious about what settings can cause such a big performance difference. Thanks again for your excellent work, and best regards!

Gofinge commented 2 months ago

Typically, it shouldn't have such a low performance. It would be great if you could check whether adapted the correct base config. (Such as using nuScenes config as a base, and then aligning the data augmentation pipeline with other SemanticKITTI config. You know, our PTv3 can already have a mIoU over 70%.

Javion11 commented 2 months ago

The config I use is https://github.com/Pointcept/Pointcept/blob/main/configs/nuscenes/semseg-pt-v3m1-0-base.py. The only difference is that I changed the dataset to SemanticKITTI. And the result I get is

截屏2024-04-25 11 15 32

Similar to @Sylva-Lin

Hi, I also met this problem. Could you share your config about SemanticKITTI? 屏幕截图 2024-04-24 100118

Sylva-Lin commented 2 months ago

Hi, I have a response here (Pointcept/Pointcept#205). Another thing is I have been restricted in computing resources recently, so I have to wait until some resources are available. But it won't take a long time; I should have some time and free GPU within one month.

I changed the config without PointClip, but the result was also bad, and I found the Val result not good in sparse points. Is this the result worse than the paper, what else can I change? 屏幕截图 2024-04-25 114831

屏幕截图 2024-04-25 114155

Gofinge commented 2 months ago

Okay, I will increase the priority of this issue and check the config for SemanticKITTI recently.

Gofinge commented 2 months ago

Also, note that eval during training is not precise; only after a precise testing process can it be used as final results. As shown in the following images, the best eval mIoU during training is 69.3%, and after precise testing, the mIoU is 72.25%

image image
Javion11 commented 2 months ago

Thanks for your time. I noticed that the closed issue #38 has metioned the problem. Maybe this is a common problem, not case.

Sylva-Lin commented 2 months ago

Okay, I will increase the priority of this issue and check the config for SemanticKITTI recently.

Hi, I tried a variety of configs and did precise testing, but the best result was only 68.27%, which is still 2.53% short of the paper's 70.80%. My optimal config is as follows, could you help me see what needs to be changed?

misc custom setting

batch_size = 12 # bs: total bs in all gpus mix_prob = 0.8 empty_cache = False enable_amp = True

model settings

model = dict( type="DefaultSegmentorV2", num_classes=19, backbone_out_channels=64, backbone=dict( type="PT-v3m1", in_channels=4, order=["z", "z-trans", "hilbert", "hilbert-trans"], stride=(2, 2, 2, 2), enc_depths=(2, 2, 2, 6, 2), enc_channels=(32, 64, 128, 256, 512), enc_num_head=(2, 4, 8, 16, 32), enc_patch_size=(1024, 1024, 1024, 1024, 1024), dec_depths=(2, 2, 2, 2), dec_channels=(64, 64, 128, 256), dec_num_head=(4, 4, 8, 16), dec_patch_size=(1024, 1024, 1024, 1024), mlp_ratio=4, qkv_bias=True, qk_scale=None, attn_drop=0.0, proj_drop=0.0, drop_path=0.3, shuffle_orders=True, pre_norm=True, enable_rpe=False, enable_flash=True, upcast_attention=False, upcast_softmax=False, cls_mode=False, pdnorm_bn=False, pdnorm_ln=False, pdnorm_decouple=True, pdnorm_adaptive=False, pdnorm_affine=True, pdnorm_conditions=("nuScenes", "SemanticKITTI", "Waymo"), ), criteria=[ dict(type="CrossEntropyLoss", loss_weight=1.0, ignore_index=-1), dict(type="LovaszLoss", mode="multiclass", loss_weight=1.0, ignore_index=-1), ], )

scheduler settings

epoch = 50 eval_epoch = 50 optimizer = dict(type="AdamW", lr=0.002, weight_decay=0.005) scheduler = dict( type="OneCycleLR", max_lr=[0.002, 0.0002], pct_start=0.04, anneal_strategy="cos", div_factor=10.0, final_div_factor=100.0, ) param_dicts = [dict(keyword="block", lr=0.0002)]

dataset settings

dataset_type = "SemanticKITTIDataset" data_root = "/data/zclin/hl/PolarSeg/" ignore_index = -1 names = [ "car", "bicycle", "motorcycle", "truck", "other-vehicle", "person", "bicyclist", "motorcyclist", "road", "parking", "sidewalk", "other-ground", "building", "fence", "vegetation", "trunk", "terrain", "pole", "traffic-sign", ]

data = dict( num_classes=19, ignore_index=ignore_index, names=names, train=dict( type=dataset_type, split="train", data_root=data_root, transform=[

dict(type="RandomDropout", dropout_ratio=0.2, dropout_application_ratio=0.2),

        # dict(type="RandomRotateTargetAngle", angle=(1/2, 1, 3/2), center=[0, 0, 0], axis="z", p=0.75),
        dict(type="RandomRotate", angle=[-1, 1], axis="z", center=[0, 0, 0], p=0.5),
        # dict(type="RandomRotate", angle=[-1/6, 1/6], axis="x", p=0.5),
        # dict(type="RandomRotate", angle=[-1/6, 1/6], axis="y", p=0.5),
        dict(type="RandomScale", scale=[0.9, 1.1]),
        # dict(type="RandomShift", shift=[0.2, 0.2, 0.2]),
        dict(type="RandomFlip", p=0.5),
        dict(type="RandomJitter", sigma=0.005, clip=0.02),
        # dict(type="ElasticDistortion", distortion_params=[[0.2, 0.4], [0.8, 1.6]]),
        dict(
            type="GridSample",
            grid_size=0.05,
            hash_type="fnv",
            mode="train",
            keys=("coord", "strength", "segment"),
            return_grid_coord=True,
        ),
        # dict(type="PointClip", point_cloud_range=(-35.2, -35.2, -4, 35.2, 35.2, 2)),
        dict(type="SphereCrop", sample_rate=0.8, mode="random"),
        dict(type="SphereCrop", point_max=120000, mode="random"),
        # dict(type="CenterShift", apply_z=False),
        dict(type="ToTensor"),
        dict(
            type="Collect",
            keys=("coord", "grid_coord", "segment"),
            feat_keys=("coord", "strength"),
        ),
    ],
    test_mode=False,
    ignore_index=ignore_index,
),
val=dict(
    type=dataset_type,
    split="val",
    data_root=data_root,
    transform=[
        dict(
            type="GridSample",
            grid_size=0.05,
            hash_type="fnv",
            mode="train",
            keys=("coord", "strength", "segment"),
            return_grid_coord=True,
        ),
        # dict(type="PointClip", point_cloud_range=(-35.2, -35.2, -4, 35.2, 35.2, 2)),
        dict(type="ToTensor"),
        dict(
            type="Collect",
            keys=("coord", "grid_coord", "segment"),
            feat_keys=("coord", "strength"),
        ),
    ],
    test_mode=False,
    ignore_index=ignore_index,
),
test=dict(
    type=dataset_type,
    split="val",
    data_root=data_root,
    transform=[],
    test_mode=True,
    test_cfg=dict(
        voxelize=dict(
            type="GridSample",
            grid_size=0.05,
            hash_type="fnv",
            mode="test",
            return_grid_coord=True,
            keys=("coord", "strength"),
        ),
        crop=None,
        post_transform=[
            # dict(
            #     type="PointClip",
            #     point_cloud_range=(-35.2, -35.2, -4, 35.2, 35.2, 2),
            # ),
            dict(type="ToTensor"),
            dict(
                type="Collect",
                keys=("coord", "grid_coord", "index"),
                feat_keys=("coord", "strength"),
            ),
        ],
        aug_transform=[
            [
                dict(
                    type="RandomRotateTargetAngle",
                    angle=[0],
                    axis="z",
                    center=[0, 0, 0],
                    p=1,
                )
            ],
            [
                dict(
                    type="RandomRotateTargetAngle",
                    angle=[1 / 2],
                    axis="z",
                    center=[0, 0, 0],
                    p=1,
                )
            ],
            [
                dict(
                    type="RandomRotateTargetAngle",
                    angle=[1],
                    axis="z",
                    center=[0, 0, 0],
                    p=1,
                )
            ],
            [
                dict(
                    type="RandomRotateTargetAngle",
                    angle=[3 / 2],
                    axis="z",
                    center=[0, 0, 0],
                    p=1,
                )
            ],
        ],
    ),
    ignore_index=ignore_index,
),

)