Open capricixhk opened 4 months ago
Hi,
here's an example of freezing the image encoder:
model = SentenceTransformer("clip-ViT-B-32")
for p in model.model.vision_model:
p.requires_grad = False
Training this model would mean you only train the text encoder, which will probably yield lower test set scores.
Similarly, here's an example of freezing the first 4 layers of the text encoder:
for p in model[0].model.text_model.encoder.layers[0:4].parameters():
p.requires_grad = False
Hope this helps.
any one using clip for searching a large batch of PDF documents (legal docs)? is it good in this use case?
@km5ar
Hi,
not sure how many images of text/documents are present in datasets used for CLIP, but I don't think it's a lot. My best bet would be to try out something like Nougat/Donut + ColBERT/sentence transformer with paragraph chunking.
I am trying to fine tune the clip model (clip-ViT-B-32-multilingual-v1). Is there example about training it with layers frozen? Also, can I train only the text encoder without modifying the image encoder? Thanks!