Open lucasliunju opened 3 years ago
Can you double check you used exactly the same parameters as in the runs from the README? https://tensorboard.dev/experiment/vNVL9RFmTBKJ4uK81CbGMQ/#scalars&_smoothingWeight=0®exInput=imagenet21k%2FViT-B_16%2Fimagenet2012%2F
Those finetunings over 20k steps on a 8x V100 took ~18 hours and ended with 84.61% and 84.62% final accuracy.
Hi, I can find learning rate, warmup on README. I would like to ask the hyper-parameters for this experiment (https://tensorboard.dev/experiment/vNVL9RFmTBKJ4uK81CbGMQ/#scalars&_smoothingWeight=0®exInput=imagenet21k%2FViT-B_16%2Fimagenet2012%2F), such as the parameters on flags.py.
Currently, I try to use the default parameters (on flags.py) to train the model and I think these parameters is designed for cifar-10.
So I would like to ask the parameters for this experiment result (https://tensorboard.dev/experiment/vNVL9RFmTBKJ4uK81CbGMQ/#scalars&_smoothingWeight=0®exInput=imagenet21k%2FViT-B_16%2Fimagenet2012%2F)
Thank you!
It was trained with the default parameters; you can verify this in the hparams
tab of above tensorbaord.dev link.
So training this for 20k steps on a 8x TPUv2 should give you identical results. Can you share the full training metrics for comparison?
I think maybe that because I am using tpu not gpu?
Hi, I just change the dataset from cifar-10 to imagenet2012 and havn't change any others on this code. My training log as follows: https://docs.google.com/document/d/1uWwylLuNi_aQsYaovNCuM3fKVuPRVIuMXFShM5foDRg/edit?usp=sharing
Comparing with the results on https://tensorboard.dev/experiment/vNVL9RFmTBKJ4uK81CbGMQ/#scalars&_smoothingWeight=0®exInput=imagenet21k%2FViT-B_16%2Fimagenet2012%2F
I find the test accuracy has a gap from step 2000.
That's unexpected.
We have produced our original results on TPUs, but then I only tested the open sourced code on GPUs (links form README). Let me rerun the code with TPUs to see if I can reproduce your results first.
Thanks so much!
Hi, I would like to ask the update result you run on the tpu. I find there ia also a gap on the result of cifar10.
Thank you!
Yong
I can confirm I also got a similar 84.13% final accuracy that you already reported on a 8 TPUv2. I am now rerunning with some changed configs on both TPU and GPU to verify these results and try to understand what could cause the difference. Will update here later when results are available.
Thank you. Have a nice day!
By accident, the original runs used dropout=0.0
which resulted in an improvement of the results reported in the README over the results reported in the paper (where we have 83.97% top-1 accuracy for B/16).
I added a comment to the top of the table but that got removed when the README was later updated with additional results. Fixed in dab0a5c.
I also checked that you get 84.63% when running on TPU with dropout=0.0
(and that GPU gets to 84.12%, when running with dropout=0.1
).
We're working on an updated release using the newer Flax Linen API and will regenerate the entire table for that purpose.
Dropout is set in the config file here:
Thanks so much! I'll try it immediately. By the way, did you try to use LARS as the optimizer?
Thank you!
Yong
No we didn't try LARS optimizer. But that might be worthwhile.
Thanks for your reply! It does work now. I'll try to implement LARS optimizer and update it when I finished it.
Yong
@lucasliunju Can you please share your fine tuning code for imagenet 1K? I am trying to fine tune ViT Base_16 on Imagenet 1L from Imagenet21K pretraining with image size of 224 and can't reproduce the results (reaching an acc of 83.7 while the reported results are 84.4 ). Specifically, can you mention the data augmentations used and perhaps additional methods used (EMA? Averaging? head Initialization)
Thanks in advanced
Hi,
I try to fine-tune the ViT-B-16 Model on imagenet2012 on tpu v3-8. The top-1 accuracy is 84.1% (different from 84.6%). I would like to ask whether I need to change the default hyper-parameters for this experimrnt.
Thank you!