question about performance on Visual Grounding task when using prefix tuning

h-ccc commented 1 year ago

Thank you very much for your outstanding work. I was very inspired by OFA. When I tried to reproduce the results of prefix tuning on Visual Grounding, I encountered some performance problems. For example, on Refcoco+, the result of my reproduction is 75.17/80.61/65.94 (v. s. 76.34/81.44/67.68 reported in Prompt Tuning for Generative Multimodal Pretrained Models). The following is the hyperparameter I set according to train_refcoco_prefix.sh. Can you provide the corresponding checkpoints or help me point out my mistakes? Thank you so much

../../train.py
--local_rank=0
../../dataset/refcoco_data/refcocoplus_train.tsv,../../dataset/refcoco_data/refcocoplus_val.tsv
--selected-cols=0,4,2,3,1
--bpe-dir=../../utils/BPE
--user-dir=../../ofa_module
--restore-file=../../checkpoints/ofa_base.pt
--reset-optimizer
--reset-dataloader
--reset-meters
--save-dir=${save_dir}
--task=refcoco
--arch=ofa_base
--criterion=adjust_label_smoothed_cross_entropy
--label-smoothing=0.1
--batch-size=8
--update-freq=8
--encoder-normalize-before
--decoder-normalize-before
--share-decoder-input-output-embed
--share-all-embeddings
--layernorm-embedding
--patch-layernorm-embedding
--code-layernorm-embedding
--resnet-drop-path-rate=0.0
--encoder-drop-path-rate=0.2
--decoder-drop-path-rate=0.2
--dropout=0.1
--attention-dropout=0.0
--weight-decay=0.01
--optimizer=adam
--adam-betas=(0.9,0.999)
--adam-eps=1e-08
--clip-norm=1.0
--lr-scheduler=polynomial_decay
--lr=0.03
--max-epoch=100
--warmup-ratio=0.06
--log-format=simple
--log-interval=10
--fixed-validation-seed=7
--no-epoch-checkpoints
--keep-best-checkpoints=1
--save-interval=1
--validate-interval=1
--save-interval-updates=500
--validate-interval-updates=500
--eval-acc
--eval-args={"beam":5,"min_len":4,"max_len_a":0,"max_len_b":4}
--best-checkpoint-metric=score
--maximize-best-checkpoint-metric
--max-src-length=80
--max-tgt-length=20
--find-unused-parameters
--add-type-embedding
--scale-attn
--scale-fc
--encoder-prompt
--decoder-prompt
--encoder-prompt-type=prefix
--decoder-prompt-type=prefix
--encoder-prompt-length=64
--decoder-prompt-length=64
--scale-heads
--disable-entangle
--num-bins=1000
--patch-image-size=480
--fp16
--fp16-scale-window=512
--num-workers=0
--tensorboard-logdir=${tblog}

JustinLin610 commented 1 year ago

OK. We'll have a check. This is possible as prefix tuning is sensitive to hyperparameters.

h-ccc commented 1 year ago

Thanks a lot in advance!!

yh351016 commented 1 year ago

@h-ccc Hi, Thank you for your attention and support in our work. If you are using a single machine single card training, you can expand the batch to 256/512 through update-freq , aligning with batchsize in Github.

h-ccc commented 1 year ago

Thank you for your reply! Should I set batch-size to 256/512? But it will lead to the "out of memory" issue. For the hyperparameters of batch-size and update-freq, I followed the settings in train_refcoco_prefix.sh. Concretely, I tried running with batch-size of 8 on two 2080-Ti or with batch-size of 16 on a single A100 for the base model. update-freq is always set to 8.

yh351016 commented 1 year ago

@h-ccc Hi, you can expand the batch size by using the update-freq (multiple forward, once backward)

h-ccc commented 1 year ago

I will try it, Thanks a lot for answering !

TungWg commented 1 year ago

Hi @h-ccc , I have a similar problem to you. I tried to use prefix tuning to train OFA on a single 4090 gpu, but I found that my training was not convergent. The loss dropped to about 8.4 and would no longer decline, and the grounding accuracy was only about 6%. I also tried your parameter settings, but it still doesn't work. Could you please send me your complete training configuration .sh file? zarath_xuany@163.com. Thank you very much!

h-ccc commented 1 year ago

It took a long time to train OFA with prefix tuning (several days on a single A100). The hyperparameters I used are shown above, copied directly from my .sh file. You can check your checkpoint file (ofa_base.pt) and your dataset. I think that the 6% accuracy is not solely attributed to the hyperparameters.

TungWg commented 1 year ago

@h-ccc , Thank you for your reply. I downloaded the ofa_base.pt and the dataset file (.tsv) from the link provided by OFA, and I think there should be no mistake. I must have been wrong about some of the details, so the training didn't converge. So, could you please send me your .sh file so I can try it on the single gpu to find out more about the problem. Thanks again!

OFA-Sys / OFA

question about performance on Visual Grounding task when using prefix tuning #361