google-research / big_vision

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Apache License 2.0
2.04k stars 140 forks source link

Clarification: SigLIP Image Transform #91

Open siddk opened 5 months ago

siddk commented 5 months ago

Thanks for open-sourcing the SigLIP models!

Clarification question: in the demo IPython notebook, the image transform function has the form pp_img = pp_builder.get_preprocess_fn(f'resize({RES})|value_range(-1, 1)').

Looking at the code here, this seems to be resizing an image to RES x RES (warping aspect ratio).

Is this the expected behavior? Were the SigLIP models trained with this transform (aspect ratio warping)?