cls_token problem with image.

evertonipx commented 9 months ago

When I use only prompt text mPLUG-Owl2 works fine. But when I include an image have this error:

File "C:\py projects\IPXCopilot_OWLVersion\mplug_owl2\model\visual_encoder.py", line 117, in forward if self.cls_token : RuntimeError: Boolean value of Tensor with more than one value is ambiguous

If I change to: if self.cls_token is not None: I got this error:

File "C:\py projects\IPXCopilot_OWLVersion\mplug_owl2\model\visual_encoder.py", line 123, in forward embeddings = embeddings + get_abs_pos(self.position_embedding,embeddings.size(1)) RuntimeError: The size of tensor a (1024) must match the size of tensor b (1049600) at non-singleton dimension 2

Anyone with the same problem? Worked fine before the update

findalexli commented 9 months ago

Following as well, getting this exact issue

jiaqixuac commented 9 months ago

It seems that the updated code does not deal with cls_token well. See https://github.com/X-PLUG/mPLUG-Owl/commit/54b508a7254621977c8d662d203bd0d3c8a7e428 If modify if embeddings.shape[1] != self.num_patches: -> if self.cls_token is None and embeddings.shape[1] != self.num_patches:, it can work.

vateye commented 9 months ago

Fixed.

X-PLUG / mPLUG-Owl

cls_token problem with image. #207