why map pixels between `input_range`?

bes-dev / pytorch_clip_guided_loss

A simple library that implements CLIP guided loss in PyTorch.

https://pypi.org/project/pytorch-clip-guided-loss/

Apache License 2.0

77 stars 5 forks source link

why map pixels between `input_range`? #1

Closed fcakyon closed 2 years ago

fcakyon commented 2 years ago

@bes-dev thanks for the awesome work! I have one question:

Why do you manually map the image pixels between -1 and 1, instead of directly using the https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPFeatureExtractor?

bes-dev commented 2 years ago

@fcakyon thanks for your feedback!

So, we don't use transformers.CLIPFeatureExtractor because it isn't differentiable operation (they use PIL.Image transforms to make resize, etc.). As you can see from VQGAN-CLIP sample, we can make backpropagation to input image. To do it, we use our differentiable implementation of the mapping input image to valid input tensor of the visual feature extractor.

fcakyon commented 2 years ago

@bes-dev makes perfect sense! Then this implementation is necessary for vqgan example and not essential for cliprcnn example, right?

bes-dev commented 2 years ago

@fcakyon yes, we use CLIP guided loss inference only for ranking without backward path in ClipRCNN.