[QUESTION] Using this on blip captioner

Tenpi commented 1 year ago

❔ Any questions

Hello, I'm trying to use on the blip captioner: https://github.com/salesforce/BLIP

The first thing is that their forward method takes an additional "caption" parameter, which causes an error, but I managed to get around it by just setting it to empty string.

def forward(self, image, caption = ""):

But after that, I still seem to be getting an error when making the adv images. With this code:

adv_image = atk(img, torch.tensor([1.0]))

I get the error IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1). Using this code (setting dimension to 0) I get a different error: IndexError: Dimension specified as 1 but tensor has no dimensions

adv_image = atk(img, torch.tensor(1.0))

This is the full code I have, the blip model is imported from the repo I linked.

import torch
import torchattacks
from torchvision import transforms
from PIL import Image
from models.blip.blip import blip_decoder

def load_blip_image(image, dim):
    raw_image = Image.open(image).convert("RGB")
    width, height = raw_image.size
    transform = transforms.Compose([
        transforms.Resize((dim, dim), interpolation=transforms.InterpolationMode.BICUBIC),
        transforms.ToTensor()
    ])
    return transform(raw_image).unsqueeze(0).to(device)

def blip(input, output):
    model = blip_decoder(pretrained="models/blip/blip.pt", image_size=384, vit="base")
    model.eval().to(device)
    img = load_blip_image(input, 384)
    caption = model.generate(img)
    print(caption)
    atk = torchattacks.FGSM(model, eps=10/255)
    adv_image = atk(img, torch.tensor(1.0))

I'm also not sure how to use the caption as the "label", which is why I just pass tensor(1), but I tested it on other models and it seems like it works.

rikonaka commented 1 year ago

Hi @Tenpi , first, torchattacks does not have a caption parameter in its self-build forward function, so I don't know what that means (you may want to contact the BLIP developer for more information).

https://github.com/Harry24k/adversarial-attacks-pytorch/blob/c4da6a95546283992a3d1816ae76a0cd4dfc2d8b/torchattacks/attacks/fgsm.py#L33

Second, the labels shape should look like [n], where n is the number of your input images. 😜

Tenpi commented 1 year ago

Yes, since I couldn't pass any other parameters to the forward function I just edited it to use an empty string for caption. I am using it on one image so I guess tensor([1.0]) was the right one. However, I still don't know why I get this error:

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

occuring on this line in fgsm.py:

cost = loss(outputs, labels)

I think I have to change some other code in BLIP to make it work.

rikonaka commented 1 year ago

Well @Tenpi , can you print the images and labels shapes in your code? I think you should use it like this

adv_image = atk(img, torch.tensor([1.0]))

Tenpi commented 1 year ago

Ok, I figured it out. The forward function was returning a tensor of dimension 0, which caused the error. Here is the full solution by modifying blip.py:

Set caption to empty string

def forward(self, image, caption = ""):

Expand the return tensor to 1 dim

loss_lm = decoder_output.loss
loss_lm = loss_lm.expand(1)
return loss_lm

However the fgsm attack didn't work for me, it just gives me same image as input. I changed it to the PGD attack which did work.

rikonaka commented 1 year ago

However the fgsm attack didn't work for me, it just gives me same image as input. I changed it to the PGD attack which did work.

You can try to increase the perturbation value of FGSM to get a visible perturbation. 🥰

Harry24k / adversarial-attacks-pytorch

[QUESTION] Using this on blip captioner #154

❔ Any questions