Show-han / Zeroshot_REC

Official code for Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions (CVPR 2024)
Apache License 2.0
19 stars 0 forks source link

Can you release the Caption Triplets? #1

Closed zetaodu closed 6 months ago

Show-han commented 8 months ago

I'll upload them this week.

Show-han commented 8 months ago

It's on https://drive.google.com/file/d/1iU7RrObmEa_I3hzjxBJxjhEwyGzGhW9n/view?usp=sharing.

zetaodu commented 8 months ago

It's on https://drive.google.com/file/d/1iU7RrObmEa_I3hzjxBJxjhEwyGzGhW9n/view?usp=sharing.

Thanks for your quick reply, I want to know how much resources it takes to train and I can't run it on my single GPU. Thanks~

Show-han commented 8 months ago

What do you mean for "train"? You mean fine-tune CLIP model?

zetaodu commented 8 months ago

yes!

Show-han commented 8 months ago

I fine-tune them on 8*A6000 GPU (48G). You can try to reduce the batchsize, or use the gradient accumulation technique to reduce the computational cost.

zetaodu commented 8 months ago

get it, thanks again