Pointcept / Pointcept

Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT (CVPR'24), OA-CNNs (CVPR'24), MSC (CVPR'23)
MIT License
1.36k stars 145 forks source link

I have a question about PTv3 + PPT training config.. #174

Open jth5566 opened 4 months ago

jth5566 commented 4 months ago

Thank you for providing the code necessary for my research.

I have a question about PTv3 + PPT training.

I checked config file which you provided in Modelzoo(ScanNet validation on PTv3 + PPT).

semseg-pt-v3m1-1-ppt-extreme.py :

model = dict( type="PPT-v1m1", backbone=dict( type="PT-v3m1", in_channels=6, order=("z", "z-trans", "hilbert", "hilbert-trans"), stride=(2, 2, 2, 2), enc_depths=(3, 3, 3, 6, 3), enc_channels=(48, 96, 192, 384, 512), enc_num_head=(3, 6, 12, 24, 32), enc_patch_size=(1024, 1024, 1024, 1024, 1024), dec_depths=(3, 3, 3, 3), dec_channels=(64, 96, 192, 384), dec_num_head=(4, 6, 12, 24), dec_patch_size=(1024, 1024, 1024, 1024), mlp_ratio=4, qkv_bias=True, qk_scale=None, attn_drop=0.0, proj_drop=0.0, drop_path=0.3, shuffle_orders=True, pre_norm=True, enable_rpe=False, enable_flash=True, upcast_attention=False, upcast_softmax=False, cls_mode=False, pdnorm_bn=True, pdnorm_ln=True, pdnorm_decouple=True, pdnorm_adaptive=False, pdnorm_affine=True, pdnorm_conditions=("ScanNet", "S3DIS", "Structured3D"), ),

default setting is pdnorm_adaptive=False, and I checked prompt_driven_normalization.py:

prompt_driven_normalization.py:

def forward(self, point):
    assert {"feat", "condition"}.issubset(point.keys())
    if isinstance(point.condition, str):
        condition = point.condition
    else:
        condition = point.condition[0]
    if self.decouple:
        assert condition in self.conditions
        norm = self.norm[self.conditions.index(condition)]
    else:
        norm = self.norm
    point.feat = norm(point.feat)
    if self.adaptive: <---------------------- False
        assert "context" in point.keys()
        shift, scale = self.modulation(point.context).chunk(2, dim=1)
        point.feat = point.feat * (1.0 + scale) + shift
    return point

when I ran the code with default config setting, I checked not change context(prompt) value because of self.adaptive = False.

Then, this means that Prompt driven normalization is not applied. In other words, it seems that the context (prompt) is not utilized, and joint training is proceeding without it. However, doesn't the application of PPT imply that prompt learning should be applied?Could I have missed something? Or was that performance achieved solely through categorical alignment without any prompt learning?

Gofinge commented 4 months ago

Good point! Set pdnorm_adaptive=False doesn't mean naive joint training, naive joint training causes negative transfer as discussed in the PPT paper. PDNorm consists of two components: 1. scale, bias adaptive adjustment with task condition prompt; (controlled by pdnorm_adaptive) 2. Running mean and running std statistic adjusts with task condition (controlled by pdnorm_decoupled). In PTv3, we found that just enabling pdnorm_decoupled can have about 90% gains of PDNorm. So we only enable pdnorm_decoupled for PTv3 + PPT for a more efficient and simple format.

Yet, we believe that the full potential of scale, and bias adaptive with text embedding prompt is still not fully unleashed due to the current multi-dataset joint training introduced in our PPT is still not complex (yet enough effective), only three datasets without modeling the characteristics of datasets. We schedule to conduct further research, introducing all-data learning to our community with a more enhanced text-embedding prompt towards dataset characteristics.

jth5566 commented 4 months ago

Thank you for your quick response!

jth5566 commented 3 months ago

Hi I have another question related to this issue. Have you trained with PD_norm_adaptive = True and PD_norm_decouple = False config setting? Can using just one normalization layer and solely adaptive PPT training achieve good performance?