Lednik7 / CLIP-ONNX

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)
MIT License
193 stars 25 forks source link

Textual .forward() override bug. #12

Open Great-Frosty opened 1 year ago

Great-Frosty commented 1 year ago

Hi! First of all, thanks a lot for a great example project! However, I'm pretty sure I have found a bug. onnx_model.encode_text() produces results that are not consistent with regular model output. There is a statement in .utils: take features from the eot embedding (eot_token is the highest number in each sequence).

But ruclip uses eos_id = 3, which clearly is not the highest token in the sequence. This change was clearly made after you released you repo, but there was no version bump. So, I tried tracing the model with hardcode eos id and original .where line from ruclip's encode_text: x = x[torch.arange(x.shape[0]), torch.where(text == 3)[1]] @ self.text_projection and it worked. I will submit a pull request with a fix, if I have free time later, but for now just wanted you to know, that running the default version of your notebook produces incorrect text_encoding results

yiluzhou1 commented 1 year ago

@Great-Frosty I also found this bug when trying the openclip model (https://github.com/mlfoundations/open_clip). The output from .encode_text() is different from openclip model.

GodTamIt commented 8 months ago

@Great-Frosty were you ever able to take a look at this?

I'm unsure if this affects only ruclip or other clips as well.

Great-Frosty commented 8 months ago

@GodTamIt Me neither. It got fixed in ruclip though https://github.com/ai-forever/ru-clip/pull/19