BolinLai / GLC

[BMVC2022, IJCV2023, Best Student Paper, Spotlight] Official codes for the paper "In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation".
20 stars 3 forks source link

Reproduction issues #6

Closed ZhaofengSHI closed 6 months ago

ZhaofengSHI commented 6 months ago

Hi! Thanks for your excellent work, I reproduced the code following the process in the readme.md and found that the loss does not converge. May I ask what the reason is?

The running script is: CUDA_VISIBLE_DEVICES=4,5,6,7 python tools/run_net.py \ --init_method tcp://localhost:9878 \ --cfg configs/Egtea/MVIT_B_16x4_CONV.yaml \ TRAIN.BATCH_SIZE 16 \ TEST.BATCH_SIZE 128 \ NUM_GPUS 4 \ TRAIN.CHECKPOINT_FILE_PATH /data/zhaofeng/Gaze/GLC-main/pretrained/K400_MVIT_B_16x4_CONV.pyth \ OUTPUT_DIR checkpoints/GLC_egtea \ DATA.PATH_PREFIX /data/zhaofeng/Gaze/Datasets/EGTEA_Gaze+

The devices are: 4 NVIDIA TITAN xp GPUs

Thank you very much! Looking forward your reply.

BolinLai commented 6 months ago

Can you show me what your loss looks like?

ZhaofengSHI commented 6 months ago

I have already addressed this issue. It seems like weight decay factor is so high that the loss can't converge, thank you!

ZhaofengSHI commented 6 months ago

I apologize for bothering you again, I ended up not solving the problem after trying and the loss curve looks like this. What is the reason for this, please? Thank you! curve

BolinLai commented 5 months ago

What are the hyperparameter you used for training?

ZhaofengSHI commented 5 months ago

Hello! Thanks for your reply!

My running script is: CUDA_VISIBLE_DEVICES=0,1,2,3 python tools/run_net.py \ --init_method tcp://localhost:9668 \ --cfg configs/Egtea/MVIT_B_16x4_CONV.yaml \ TRAIN.BATCH_SIZE 16 \ TEST.BATCH_SIZE 64 \ NUM_GPUS 4 \ TRAIN.CHECKPOINT_FILE_PATH /data1/zhaofeng/Ego_Gaze/GLC-main/pretrained/K400_MVIT_B_16x4_CONV.pyth \ OUTPUT_DIR checkpoints/GLC_egtea \ DATA.PATH_PREFIX /data1/zhaofeng/Ego_Gaze/Datasets/EGTEA_Gaze+

The hyperparameter setting is: TRAIN: ENABLE: True DATASET: egteagaze BATCH_SIZE: 12 EVAL_PERIOD: 10 ## CHECKPOINT_PERIOD: 10 ## AUTO_RESUME: False CHECKPOINT_EPOCH_RESET: True DATA: PATH_PREFIX: '/data/egtea_gp' NUM_FRAMES: 8 SAMPLING_RATE: 8 TRAIN_JITTER_SCALES: [256, 320] TRAIN_CROP_SIZE: 256 TEST_CROP_SIZE: 256 INPUT_CHANNEL_NUM: [3] TARGET_FPS: 24 USE_OFFSET_SAMPLING: False GAUSSIAN_KERNEL: 19 MVIT: ZERO_DECAY_POS_CLS: False SEP_POS_EMBED: True DEPTH: 16 NUM_HEADS: 1 EMBED_DIM: 96 PATCH_KERNEL: (3, 7, 7) PATCH_STRIDE: (2, 4, 4) PATCH_PADDING: (1, 3, 3) MLP_RATIO: 4.0 QKV_BIAS: True DROPPATH_RATE: 0.2 NORM: "layernorm" MODE: "conv" CLS_EMBED_ON: False GLOBAL_EMBED_ON: True DIM_MUL: [[1, 2.0], [3, 2.0], [14, 2.0]] HEAD_MUL: [[1, 2.0], [3, 2.0], [14, 2.0]] POOL_KVQ_KERNEL: [3, 3, 3] POOL_KV_STRIDE_ADAPTIVE: [1, 8, 8] POOL_Q_STRIDE: [[1, 1, 2, 2], [3, 1, 2, 2], [14, 1, 2, 2]] DROPOUT_RATE: 0.0 BN: USE_PRECISE_STATS: False NUM_BATCHES_PRECISE: 200 SOLVER: ZERO_WD_1D_PARAM: True CLIP_GRAD_L2NORM: 1.0 BASE_LR_SCALE_NUM_SHARDS: True BASE_LR: 0.0001 # 0.0001 COSINE_AFTER_WARMUP: True COSINE_END_LR: 1e-6 WARMUP_START_LR: 1e-6 WARMUP_EPOCHS: 5.0 #5.0 LR_POLICY: cosine MAX_EPOCH: 25 #25 MOMENTUM: 0.9 WEIGHT_DECAY: 0.05 #0.05 OPTIMIZING_METHOD: adamw MODEL: NUM_CLASSES: 400 ARCH: mvit MODEL_NAME: GLC_Gaze LOSS_FUNC: kldiv DROPOUT_RATE: 0.5 #0.5 TEST: ENABLE: True DATASET: egteagaze BATCH_SIZE: 12 NUM_SPATIAL_CROPS: 1 NUM_ENSEMBLE_VIEWS: 1 DATA_LOADER: NUM_WORKERS: 8 PIN_MEMORY: False TENSORBOARD: ENABLE: True

NUM_GPUS: 4 NUM_SHARDS: 1 RNG_SEED: 0 OUTPUT_DIR: .

Then I got the above result.

BolinLai commented 5 months ago

That's a weird phenomenon. The training F1 should go much higher, like to more than 0.5. Did you try inference with the released weights? Can you get the same number?