CVMI-Lab / PLA

(CVPR 2023) PLA: Language-Driven Open-Vocabulary 3D Scene Understanding & (CVPR2024) RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
Apache License 2.0
262 stars 11 forks source link

Question about Zero-shot Domain Transfer. #53

Open wu39848 opened 3 months ago

wu39848 commented 3 months ago

Hi,thank you for your great work!When I using the model pretrained on scannet without label as you provided to test on s3dis,I found that the results were worse than those found in Table 14 of the supplementary material. image

jihanyang commented 3 months ago

Hello, we do not include background categories in table 14 (i.e., ceiling, floor, wall). You can refer to this in the caption of table 14.

wu39848 commented 3 months ago

This is my yaml file,after I ignored the background categories, I got the following result: image image Ignoring the background categories doesn't seem to work, I don't know where the error occurred.

jihanyang commented 3 months ago

can you show the whole yaml file?

wu39848 commented 3 months ago

This is the whole yaml file: CLASS_NAMES: [ceiling, floor, wall, beam, column, window, door, table, chair, sofa, bookcase, board, clutter]

DATA_CONFIG: _BASECONFIG: cfgs/dataset_configs/s3dis_dataset.yaml ignore_class_idx: [0,1,2,12]

MODEL: NAME: SparseUNetTextSeg REMAP_FROM_3DLANG: False REMAP_FROM_NOADAPTER: False

VFE: NAME: IndoorVFE USE_XYZ: True

BACKBONE_3D: NAME: SparseUNetIndoor IN_CHANNEL: 6 MID_CHANNEL: 16 BLOCK_RESIDUAL: True BLOCK_REPS: 2 NUM_BLOCKS: 7 CUSTOM_SP1X1: True

ADAPTER: NAME: VLAdapter EVAL_ONLY: False NUM_ADAPTER_LAYERS: 2 TEXT_DIM: -1 LAST_NORM: False FEAT_NORM: False

TASK_HEAD: NAME: TextSegHead

TEXT_EMBED:
  NAME: CLIP
  NORM: True
  PATH: text_embed/s3dis_clip-ViT-B16_id.pth

LOGIT_SCALE:
  value: 1.0
  learnable: False

TEXT_ENCODER: NAME: CLIP BACKBONE: ViT-B/16 # ['RN50', 'RN101', 'RN50x4', 'RN50x16', 'RN50x64', 'ViT-B/32', 'ViT-B/16', 'ViT-L/14'] TEMPLATE: identity EXTRACT_EMBED: False # Online extract text embeding from class or not

OPTIMIZATION: TEST_BATCH_SIZE_PER_GPU: 1 BATCH_SIZE_PER_GPU: 4 NUM_EPOCHS: 32 LR: 0.004 # 4e-3 SCHEDULER: cos_after_step OPTIMIZER: adamw WEIGHT_DECAY: 0.0001 MOMENTUM: 0.9 STEP_EPOCH: 20 MULTIPLIER: 0.1 CLIP_GRAD: False PCT_START: 0.39 DIV_FACTOR: 1 MOMS: [0.95, 0.85] LR_CLIP: 0.000001

OTHERS: PRINT_FREQ: 20 EVAL_FREQ: 5 SYNC_BN: False USE_AMP: True