Open tazlin opened 6 months ago
A useful feature might be to opt into including the resulting image embeddings with an image generation request.
I.e. in the /generate/status/ endpoint, each generation result would include an r2 url containing that image’s calculated embedding safetensor file.
That being said, it’s easily avoidable by just doing the alchemy request separately, and I imagine this request would be more difficult to set up.
I think we might avoid using R2 here and just b64 the safetensors in the DB. couple-kb data per file shouldn't be a terrible amount and if bandwidth starts being choked due to these I can always switch to R2 later.
There are use cases for being able to do client-side manipulation of the various intermediate results of the clip interrogation process.
To compare an image to text via CLIP, the following happens:
open_clip
usesclip_model.encode_text(text_tokens)
. This returns atensor
.open_clip
usesclip_model.encode_image(...)
. This returns atensor
.This feature request would allow steps 1 + 2 to be returned independently, optionally as part of a regular interrogate request, or separately on their own without the need to load a CLIP model locally - they could perform the math pertinent to their use case in slow/limited RAM environments. Certain types of image-searching/database schemes could benefit from this.
I propose the following forms be added:
encode_text
.safetensors
file containing the encoded text tensor and which model was used to encode it.encode_image
source_image
and the value of a supported CLIP model..safetensors
file containing the encoded image features and which model was used to encode it.This proposal has the obvious wrinkle of needing to support the upload of
.safetensors
files. The size of these files is on the order of magnitude of single-digit kilobytes.Related to https://github.com/Haidra-Org/horde-worker-reGen/issues/9.