Closed rex-yue-wu closed 1 year ago
Hi, thanks for your interest in our work. I just uploaded all the logs and configs in google_drive. Just to let you know, 512M samples are selected randomly. We have a detailed ablation of # seen samples (Table 10) in our paper. Our high-resolution (336) model is fine-tuned after 224-finetuning and the number of seen samples is 128M.
For those interested in this thread, I've shared my plots based on Xianhang's training logs.
@xhl-video Can you please also share training logs of BigG models?
Hello,
I've been working to replicate the fine-tuned model results presented in your CLIPA repository, specifically for the ViT-H/14. However, I've observed that the zero-shot top-1 accuracy of my model is approximately 1.5% lower than the figures reported. Considering that I'm utilizing P4 instances (8xA100s) setup on AWS, I'm uncertain if this discrepancy falls within an expected margin.
Could you kindly share the training logs for the fine-tuned models listed? Access to these logs would greatly assist me in conducting a thorough comparison to possibly pinpoint any underlying issues.
Additionally, I have a few questions about the fine-tuning process:
The current setting is at 512M samples with a resolution of 224x224. Is the choice of 512M samples driven by budgetary constraints, or was there another rationale?
Any insights you could provide would be immensely appreciated.
Thank you for your time and consideration.