Enchancements to Embeddings: latency optimized/ debugging local model

A bit of a creative idea. Likely to be an interesting business concept, but at least a unique selling point. No other Embedding provider offers this, apart from a hacky do-it-yourself version of huggingface.

For batch-size one:

query mode
debugging/testing
weird deployments to environments with segregated networks / places where you cannot provide your ACCESS_TOKEN , it would be interesting to e.g. run a local Bert locally

I would suggest to add a base / not fine-tuned encoder model (bge-large) with a SentenceTransformers like setup (ONNX-cpu or CTranslate2-cpu, which do not require torch). Users could then switch between local mode and API mode.

pip install gradientai[local-embedder]

Preemo-Inc / gradientai-python-sdk

Enchancements to Embeddings: latency optimized/ debugging local model #12