Pointcept / PointTransformerV3

[CVPR'24 Oral] Official repository of Point Transformer V3 (PTv3)
MIT License
583 stars 30 forks source link

GPU Memory usage #31

Closed thuliu-yt16 closed 2 months ago

thuliu-yt16 commented 2 months ago

I am training PTv3 with the default configuration on point clouds with 50000 points each. I achieve a cuda OOM (on an 80G GPU) even with the batchsize=1. Is it normal?

Detail: Each point cloud is normalized to [0,1]^3. The grid size is set to 0.01. Flash attention is enabled.

Gofinge commented 2 months ago

Hi, I think this is abnormal.

thuliu-yt16 commented 2 months ago

I checked the gpu memory usage under different numbers of points per sample as follows: 50k points - (78G ~ 80+G) 20k points - (17G ~ 18G) 5k points - 2G

And this is my model's configuration, which is just basically PTv3 with a following per-point feature regression linear layer (64 -> out_channels):

in_channels = 6
out_channels = 8
ps = 1024
model = models.CustomPTv3(
    in_channels=in_channels,
    out_channels=out_channels,
    order=("z", "z-trans", "hilbert", "hilbert-trans"),
    stride=(2, 2, 2, 2),
    enc_depths=(2, 2, 2, 6, 2),
    enc_channels=(32, 64, 128, 256, 512),
    enc_num_head=(2, 4, 8, 16, 32),
    enc_patch_size=(ps, ps, ps, ps, ps),
    dec_depths=(2, 2, 2, 2),
    dec_channels=(64, 64, 128, 256),
    dec_num_head=(4, 4, 8, 16),
    dec_patch_size=(ps, ps, ps, ps),
    mlp_ratio=4,
    qkv_bias=True,
    qk_scale=None,
    attn_drop=0.0,
    proj_drop=0.0,
    drop_path=0.3,
    pre_norm=True,
    shuffle_orders=True,
    enable_rpe=False,
    enable_flash=True,
    upcast_attention=False,
    upcast_softmax=False,
    cls_mode=False,
    pdnorm_bn=False,
    pdnorm_ln=False,
    pdnorm_decouple=True,
    pdnorm_adaptive=False,
    pdnorm_affine=True,
    pdnorm_conditions=("ScanNet", "S3DIS", "Structured3D"),
)

Do you have any suggestions for debugging? Thank you!

thuliu-yt16 commented 2 months ago

Problem solved. Non ptv3-related issue.