A bit of a creative idea. Likely to be an interesting business concept, but at least a unique selling point. No other Embedding provider offers this, apart from a hacky do-it-yourself version of huggingface.
For batch-size one:
query mode
debugging/testing
weird deployments to environments with segregated networks / places where you cannot provide your ACCESS_TOKEN , it would be interesting to e.g. run a local Bert locally
I would suggest to add a base / not fine-tuned encoder model (bge-large) with a SentenceTransformers like setup (ONNX-cpu or CTranslate2-cpu, which do not require torch). Users could then switch between local mode and API mode.
A bit of a creative idea. Likely to be an interesting business concept, but at least a unique selling point. No other Embedding provider offers this, apart from a hacky do-it-yourself version of huggingface.
For batch-size one:
I would suggest to add a
base
/ not fine-tuned encoder model (bge-large
) with a SentenceTransformers like setup (ONNX
-cpu orCTranslate2
-cpu, which do not requiretorch
). Users could then switch between local mode and API mode.pip install gradientai[local-embedder]