Closed ghLcd9dG closed 6 months ago
def normalize(images): mean = torch.tensor([0.48145466, 0.4578275, 0.40821073]).cuda() std = torch.tensor([0.26862954, 0.26130258, 0.27577711]).cuda() images = images - mean[None, :, None, None] images = images / std[None, :, None, None] return images
may i kindly ask why mean and std this numebr, is there some specific reason
Hi, VLMs are often implemented by taking use of some off-the-shelf visual encoders (e.g., CLIP). These visual encoders usually involve such input normalization.
may i kindly ask why mean and std this numebr, is there some specific reason