手动修改ChineseCLIPVisionModel to ChineseCLIPVisionModelWithProjection 失败。

OFA-Sys / Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

MIT License

4.59k stars 474 forks source link

Open ranck626 opened 5 months ago

ranck626 commented 5 months ago

import torch
import torch.nn as nn

visual_projection = nn.Linear(768, 512, bias=False) embeds = visual_projection(pooled_output) 我人为添加了一个映射层，发现和ChineseCLIPModel求出来的编码不一样。

ranck626 commented 5 months ago

应该是预训练参数的问题，但是为啥只提供没有projection的版本呢