TencentARC / ViT-Lens

[CVPR 2024] ViT-Lens: Towards Omni-modal Representations
https://ailab-cvc.github.io/seed/vitlens/
Other
140 stars 9 forks source link

What kind of textual prompts do you use during the training period? #1

Closed yifliu3 closed 10 months ago

yifliu3 commented 10 months ago

Hi,

thanks for your great work! I'm trying to adapt the ViT-Lens to the customized dataset, and I hope to aligh the textual prompts used in the inference period with the training period, so could you please share the prompt template, which may helps promote the performance.

StanLei52 commented 10 months ago

Hi there,

Thank you for asking. We used the training data from ULIP, ULIP2 and OpenShape and primarily followed their data preprocessing. You may refer to the original papers for more details.

IIRC, for ULIP-ShapeNet Triplets, we randomly filled the class name into the template during training, of which the templates are the same as those used in testing. For ULIP2-Objaverse Triplets, we used the captioning data during training and applied the template-based prompt for testing. For OpenShape data, we directly used the released features from the official repo.

The templates used for prompt engineering can be found in the ULIP repo or here.

yifliu3 commented 10 months ago

Got it! Thanks a lot for your prompt reply.