OFA-Sys / Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
MIT License
4.38k stars 453 forks source link

为什么使用同样的词和图片得到的结果不一致? #295

Open learning233 opened 5 months ago

learning233 commented 5 months ago

词:天空, 栏杆, 女人, 火车站, 火车, 人们 图片地址:https://images.pexels.com/photos/20147042/pexels-photo-20147042.jpeg?cs=srgb&dl=pexels-mateus-castro-20147042.jpg&fm=jpg

测试api: https://huggingface.co/spaces/OFA-Sys/chinese-clip-zero-shot-image-classification 使用的base https://huggingface.co/OFA-Sys/chinese-clip-vit-base-patch16

本地跑和上面俩API,这三个结果都不一样。 image image 本地:[('栏杆', 0.507383406162262), ('女人', 0.44152918457984924), ('人们', 0.02036505565047264), ('天空', 0.019294271245598793), ('火车站', 0.010599039494991302), ('火车', 0.0008290574769489467)]

ingale726 commented 3 months ago

@learning233 @JianxinMa @yangapku @jxst539246 @manymuch

ChesonHuang commented 3 months ago

@learning233 @JianxinMa @yangapku @jxst539246 @manymuch

本地预测的时候,不要进行梯度计算,用with torch.no_grad()

model.eval()  # 这行代码保证每次结果一致
with torch.no_grad():
    # 进行推断操作的代码
    output = model(input)