Open dmarx opened 2 years ago
let's not boil the ocean. goals for MVP:
MVP is basically just a git clone --recurse-submodules with maybe a few bells and whistles.
imagining usage...
import perceptors as pct
pct.available_models() # list all models
pct.available_models('clip') # pattern match
clip_rn50 = pct.Perceptor('clip_rn50') # load a model
clip_vit16 = pct.Perceptor('clip_vit16') # load another
# combine models for multi-clip
multi_clip = clip_rn50 + clip_vit16
# adjust model-specific weight
multi_clip.set_weight('clip_vit16', .1) # set weight by name
multi_clip.set_weight(0, .5) # set weight by index
# manage models
multi_clip += pct.Perceptor('clip_rn101') # add another model algebraically
multi_clip.bind('clip_vit32') # add another clip model by name
multi_clip.unbind('clip_vit16') # dissociate a bound model by name
text = clip_rn50.tokenize_text('foo bar')
text_emb = clip_rn50.embed_text('foo bar')
img_emb = clip_rn50.embed_image('path/to/image')
img_emb = clip_rn50.embed_image(img: torch.Tensor)
img_emb = clip_rn50.embed_image(img: PIL.Image)
multi_clip.embed_text('foo bar')
multi_clip.embed_image(img: ...)
One small issue people had when they were adding SLIP to many different text-to-image notebooks and code-bases was that the input resolution wasn't part of the model
So you see things like this on Disco Diffusion for e.g.:
#when using SLIP Base model the dimensions need to be hard coded to avoid AttributeError: 'VisionTransformer' object has no attribute 'input_resolution'
try:
input_resolution=model_stat["clip_model"].visual.input_resolution
except:
input_resolution=224
I feel having a default but user-changeable input resolution per model model if the model itself doesn't present one could be part of the feature-list
100%, I already encountered this issue with other CLIP providers too. I tracked down the code snippet in the original openai release that calculates this, but I like the idea of a default attribute too
Another point in reference to usage:
I feel there could be two ways of using it. One way very similar to how you wrote at imagining usage
, but the other I feel it could be identical to OpenAI's CLIP. It may be the case that this wouldn't allow for some of the fancy combinations of perceptors (although I feel this could be bridged), but on the other hand this would allow for a snappy adoption.
Someone could just replace the from CLIP import clip
to from mmc import clip
and everything would work automatically with a bunch of more perceptors out of the box. Could be a way to entry to then say "hey now that you are using this library, why not replace your custom multi-perceptor code with this one"
This is a great idea. I've noticed that there seem to be two "families" of CLIP implementations: codebases based on openai/CLIP, and codebases based on huggingface's CLIP.
Rather than changing the classes we have now, maybe we could add a wrapper class or decorator for specifying if a user wants an interface that resembles a common model family. This way, we could keep using the modality-agnostic system and leverage similar wrappers for making drop-in-able tools for tasks beyond TTI.
Is that contrived? How this might look:
my_mmc = ...loading code
my_mmc = mmc.api_wrappers.openai_clip(my_mmc)
Or actually... i guess there's no reason we couldn't go a step further and wrap the multi-mmc to make convenience classes that are pinned to specific modalities and emulate the desired APIs. I think this is closer to what you originally had in mind.
The more I think about this, the more I like it.
On Tue, Apr 19, 2022, 07:31 apolinario @.***> wrote:
Another point in reference to usage: I feel there could be two ways of using it. One way very similar to how you wrote at imagining usage, but the other I feel it could be identical to OpenAI's CLIP. It may be the case that this wouldn't allow for some of the fancy combinations of perceptors (although I feel this could be bridged), but on the other hand this would allow for a snappy adoption.
Someone could just replace the from CLIP import clip to from mmc import clip and everything would work automatically with a bunch of more perceptors out of the box. Could be a way to entry to then say "hey now that you are using this library, why not replace your custom multi-perceptor code with this one"
— Reply to this email directly, view it on GitHub https://github.com/dmarx/Multi-Modal-Comparators/issues/4#issuecomment-1102728236, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALGEAJ5JDOGEJIOU7M76Z3VF27SHANCNFSM5S6LZGDA . You are receiving this because you authored the thread.Message ID: @.***>