a consideration about unfreezing the image tower

facebookresearch / CiT

Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".

Other

78 stars 1 forks source link

Hi,

I really liked the idea of selecting training data online, thank you for publishing the code! I would like to apply this idea to my training code using a non frozen image tower and I am here to ask for a hint.

I saw all your experiments are done with a similar approach to LiT paper by google. My intuition why you're doing this is that it helps to keep the text model more stable during time, and as a consequence the amount of new training samples slightly decrease over time. Do you think unfreezing the image tower could bring to a collapse resulting in an inclusion of the whole image-text pool pairs? Have you tried to run some side experiments with all the parameters as retrainable? What behavior do you expect by that?

facebookresearch / CiT

a consideration about unfreezing the image tower #4