Unispac / Visual-Adversarial-Examples-Jailbreak-Large-Language-Models

Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models
183 stars 16 forks source link

may i kindly ask why mean and std this numebr, is there some specific reason #18

Closed ghLcd9dG closed 6 months ago

ghLcd9dG commented 6 months ago
def normalize(images):
    mean = torch.tensor([0.48145466, 0.4578275, 0.40821073]).cuda()
    std = torch.tensor([0.26862954, 0.26130258, 0.27577711]).cuda()
    images = images - mean[None, :, None, None]
    images = images / std[None, :, None, None]
    return images

may i kindly ask why mean and std this numebr, is there some specific reason

Unispac commented 6 months ago

Hi, VLMs are often implemented by taking use of some off-the-shelf visual encoders (e.g., CLIP). These visual encoders usually involve such input normalization.