Open Javion11 opened 2 months ago
Hi, we do not use any trick for validation results. Also, please don't involve these tricks in validation benchmarks.
Hi, we do not use any trick for validation results. Also, please don't involve these tricks in validation benchmarks.
Hi, I also met this problem. Could you share your config about SemanticKITTI?
Hi, I have a response here (https://github.com/Pointcept/Pointcept/issues/205). Another thing is I have been restricted in computing resources recently, so I have to wait until some resources are available. But it won't take a long time; I should have some time and free GPU within one month.
Hi, I have a response here (Pointcept/Pointcept#205). Another thing is I have been restricted in computing resources recently, so I have to wait until some resources are available. But it won't take a long time; I should have some time and free GPU within one month.
ok, thank you for your reply, I will try again by changing the config.
Hi, I have a response here (Pointcept/Pointcept#205). Another thing is I have been restricted in computing resources recently, so I have to wait until some resources are available. But it won't take a long time; I should have some time and free GPU within one month.
Thank you for your kind reply! Anyway, the work is excellent and inspiring. I'm looking forward to the release of the config that adapts to the SemanticKITTI data, because I'm very curious about what settings can cause such a big performance difference. Thanks again for your excellent work, and best regards!
Typically, it shouldn't have such a low performance. It would be great if you could check whether adapted the correct base config. (Such as using nuScenes config as a base, and then aligning the data augmentation pipeline with other SemanticKITTI config. You know, our PTv3 can already have a mIoU over 70%.
The config I use is https://github.com/Pointcept/Pointcept/blob/main/configs/nuscenes/semseg-pt-v3m1-0-base.py. The only difference is that I changed the dataset to SemanticKITTI. And the result I get is
Similar to @Sylva-Lin
Hi, I also met this problem. Could you share your config about SemanticKITTI?
Hi, I have a response here (Pointcept/Pointcept#205). Another thing is I have been restricted in computing resources recently, so I have to wait until some resources are available. But it won't take a long time; I should have some time and free GPU within one month.
I changed the config without PointClip, but the result was also bad, and I found the Val result not good in sparse points. Is this the result worse than the paper, what else can I change?
Okay, I will increase the priority of this issue and check the config for SemanticKITTI recently.
Also, note that eval during training is not precise; only after a precise testing process can it be used as final results. As shown in the following images, the best eval mIoU during training is 69.3%, and after precise testing, the mIoU is 72.25%
Thanks for your time. I noticed that the closed issue #38 has metioned the problem. Maybe this is a common problem, not case.
Okay, I will increase the priority of this issue and check the config for SemanticKITTI recently.
Hi, I tried a variety of configs and did precise testing, but the best result was only 68.27%, which is still 2.53% short of the paper's 70.80%. My optimal config is as follows, could you help me see what needs to be changed?
batch_size = 12 # bs: total bs in all gpus mix_prob = 0.8 empty_cache = False enable_amp = True
model = dict( type="DefaultSegmentorV2", num_classes=19, backbone_out_channels=64, backbone=dict( type="PT-v3m1", in_channels=4, order=["z", "z-trans", "hilbert", "hilbert-trans"], stride=(2, 2, 2, 2), enc_depths=(2, 2, 2, 6, 2), enc_channels=(32, 64, 128, 256, 512), enc_num_head=(2, 4, 8, 16, 32), enc_patch_size=(1024, 1024, 1024, 1024, 1024), dec_depths=(2, 2, 2, 2), dec_channels=(64, 64, 128, 256), dec_num_head=(4, 4, 8, 16), dec_patch_size=(1024, 1024, 1024, 1024), mlp_ratio=4, qkv_bias=True, qk_scale=None, attn_drop=0.0, proj_drop=0.0, drop_path=0.3, shuffle_orders=True, pre_norm=True, enable_rpe=False, enable_flash=True, upcast_attention=False, upcast_softmax=False, cls_mode=False, pdnorm_bn=False, pdnorm_ln=False, pdnorm_decouple=True, pdnorm_adaptive=False, pdnorm_affine=True, pdnorm_conditions=("nuScenes", "SemanticKITTI", "Waymo"), ), criteria=[ dict(type="CrossEntropyLoss", loss_weight=1.0, ignore_index=-1), dict(type="LovaszLoss", mode="multiclass", loss_weight=1.0, ignore_index=-1), ], )
epoch = 50 eval_epoch = 50 optimizer = dict(type="AdamW", lr=0.002, weight_decay=0.005) scheduler = dict( type="OneCycleLR", max_lr=[0.002, 0.0002], pct_start=0.04, anneal_strategy="cos", div_factor=10.0, final_div_factor=100.0, ) param_dicts = [dict(keyword="block", lr=0.0002)]
dataset_type = "SemanticKITTIDataset" data_root = "/data/zclin/hl/PolarSeg/" ignore_index = -1 names = [ "car", "bicycle", "motorcycle", "truck", "other-vehicle", "person", "bicyclist", "motorcyclist", "road", "parking", "sidewalk", "other-ground", "building", "fence", "vegetation", "trunk", "terrain", "pole", "traffic-sign", ]
data = dict( num_classes=19, ignore_index=ignore_index, names=names, train=dict( type=dataset_type, split="train", data_root=data_root, transform=[
# dict(type="RandomRotateTargetAngle", angle=(1/2, 1, 3/2), center=[0, 0, 0], axis="z", p=0.75),
dict(type="RandomRotate", angle=[-1, 1], axis="z", center=[0, 0, 0], p=0.5),
# dict(type="RandomRotate", angle=[-1/6, 1/6], axis="x", p=0.5),
# dict(type="RandomRotate", angle=[-1/6, 1/6], axis="y", p=0.5),
dict(type="RandomScale", scale=[0.9, 1.1]),
# dict(type="RandomShift", shift=[0.2, 0.2, 0.2]),
dict(type="RandomFlip", p=0.5),
dict(type="RandomJitter", sigma=0.005, clip=0.02),
# dict(type="ElasticDistortion", distortion_params=[[0.2, 0.4], [0.8, 1.6]]),
dict(
type="GridSample",
grid_size=0.05,
hash_type="fnv",
mode="train",
keys=("coord", "strength", "segment"),
return_grid_coord=True,
),
# dict(type="PointClip", point_cloud_range=(-35.2, -35.2, -4, 35.2, 35.2, 2)),
dict(type="SphereCrop", sample_rate=0.8, mode="random"),
dict(type="SphereCrop", point_max=120000, mode="random"),
# dict(type="CenterShift", apply_z=False),
dict(type="ToTensor"),
dict(
type="Collect",
keys=("coord", "grid_coord", "segment"),
feat_keys=("coord", "strength"),
),
],
test_mode=False,
ignore_index=ignore_index,
),
val=dict(
type=dataset_type,
split="val",
data_root=data_root,
transform=[
dict(
type="GridSample",
grid_size=0.05,
hash_type="fnv",
mode="train",
keys=("coord", "strength", "segment"),
return_grid_coord=True,
),
# dict(type="PointClip", point_cloud_range=(-35.2, -35.2, -4, 35.2, 35.2, 2)),
dict(type="ToTensor"),
dict(
type="Collect",
keys=("coord", "grid_coord", "segment"),
feat_keys=("coord", "strength"),
),
],
test_mode=False,
ignore_index=ignore_index,
),
test=dict(
type=dataset_type,
split="val",
data_root=data_root,
transform=[],
test_mode=True,
test_cfg=dict(
voxelize=dict(
type="GridSample",
grid_size=0.05,
hash_type="fnv",
mode="test",
return_grid_coord=True,
keys=("coord", "strength"),
),
crop=None,
post_transform=[
# dict(
# type="PointClip",
# point_cloud_range=(-35.2, -35.2, -4, 35.2, 35.2, 2),
# ),
dict(type="ToTensor"),
dict(
type="Collect",
keys=("coord", "grid_coord", "index"),
feat_keys=("coord", "strength"),
),
],
aug_transform=[
[
dict(
type="RandomRotateTargetAngle",
angle=[0],
axis="z",
center=[0, 0, 0],
p=1,
)
],
[
dict(
type="RandomRotateTargetAngle",
angle=[1 / 2],
axis="z",
center=[0, 0, 0],
p=1,
)
],
[
dict(
type="RandomRotateTargetAngle",
angle=[1],
axis="z",
center=[0, 0, 0],
p=1,
)
],
[
dict(
type="RandomRotateTargetAngle",
angle=[3 / 2],
axis="z",
center=[0, 0, 0],
p=1,
)
],
],
),
ignore_index=ignore_index,
),
)
Thanks for your work! I have reproducted ptv3 with your codebase, but the mIoU on SemanticKITTI only reach about 65. The config is the same as NuScenes, which is shown in your paper! I would like to ask what is the reason for such a big performance gap (70.8 in the ptv3 paper)?