TencentARC / ViT-Lens

[CVPR 2024] ViT-Lens: Towards Omni-modal Representations
https://ailab-cvc.github.io/seed/vitlens/
Other
140 stars 9 forks source link

Training Time and GPU usages #7

Closed haoranD closed 4 months ago

haoranD commented 4 months ago

Hi,

Thanks for the exceptional work you have presented. It is truly remarkable and contributes significantly to the field.

After reviewing your paper, I noted the mention of experiments being conducted on 32GB V100 GPU clusters. However, can you please give more details of the resources utilized for this project? Could you kindly provide information on the total training time and the exact number of GPUs employed during this period?

Thanks a lot.

StanLei52 commented 4 months ago

Hi there,

Thank you for your interest. Please refer to TRAIN_INFERENCE.md and click the training items for details. Note that you may use less GPUs by setting --accum-freq, and it may lead to a different training time compared to our experiments. If you require further details on other model variants, feel free to reach out.

haoranD commented 4 months ago

Thanks for your kind reply, it is quite helpful.

May I ask when the code for integration with InstructBLIP and SEED will be released?

StanLei52 commented 4 months ago

Due to limited bandwidth, I will aim to release these parts within one month.

haoranD commented 4 months ago

Thanks