Closed dhruv-anand-aintech closed 3 months ago
We support embeddings in huggingface's text-embeddings-inference format
Won't you be hosting your embedding model?
I wouldn't be hosting it as a service like HF-TEI. I just want the library to wrap SentenceTransformers.encode and transformers pipelines for feature extraction locally. Like litellm.embedding('hf/miniLM-L12....', input) Where the model runs on the same machine as where the library is being called
Would you be fine with me sending a PR for this?
sure @dhruv-anand-aintech
Hey @dhruv-anand-aintech thinking aloud - would it be better if we just exposed a custom llm class interface, for easily adding custom providers?
@krrishdholakia I'm wondering if there can be an interface to add a custom caching function so the user can implement whatever caching logic they want.
I'd like to implement a custom caching service with custom models. The interface to litellm could be a simple python function like below
def custom_cache(input_messages) ->Union[None, str]:
# custom logic here
# eg: return requests.post(custom_caching_service, input_messages)
# response is None if no cache hit. String if successful cache hit
litellm.cache = Cache(type="custom", custom_cache_fn=custom_cache)
Happy to work together to submit a PR on this
isn't that this - https://docs.litellm.ai/docs/caching/redis_cache#custom-cache-keys? @stephenleo
Maybe I misunderstand... I'd like to implement my own semantic cache function, something like below
Given the messages
arg from litellm. completion
:
These steps should make the semantic cache much more accurate than pure dense embedding-based semantic similarity matching.
I think others might have other ideas, so having a way to overwrite the semantic similarity search logic in the semantic cache with a custom logic should help free up litellm from having to implement a lot of different methods like cross encoders, colbert, etc.
Let me know if that makes sense?
@stephenleo this should be possible today, is this what you need ? If yes i'll add to docs
from litellm.caching import Cache
cache = Cache()
def add_cache(self, result, *args, **kwargs):
your logic
def get_cache(self, *args, **kwargs):
your logic
cache.add_cache = add_cache
cache.get_cache = get_cache
Perfect! Thanks. I'll test it out
For those who have the same requirement, I have figured out a quick solution.
First of all, implement a HuggingFace compatible API:
# api.py
from flask import Flask, jsonify, request
from sentence_transformers import SentenceTransformer
app = Flask(__name__)
model = SentenceTransformer('<YOUR-EMBEDDING-MODEL>')
def auth(token):
api_key = token.removeprefix('Bearer ')
# print(api_key)
return True
@app.route('/embeddings', methods=['POST'])
def embed():
if not auth(request.headers.get('Authorization', '')):
abort(401)
data = request.get_json()
texts = data['inputs']
embeddings = model.encode(texts)
return jsonify(embeddings.tolist())
if __name__ == '__main__':
app.run(host='127.0.0.1', port=5000)
Then run the API:
python api.py
Finally, just follow the guide from LiteLLM:
from litellm import embedding
import os
os.environ['HUGGINGFACE_API_KEY'] = '<YOUR-API-KEY>'
os.environ['HUGGINGFACE_API_BASE'] = 'http://127.0.0.1:5000/embeddings'
response = embedding(
model='huggingface/<YOUR-EMBEDDING-MODEL>',
input=['good morning from litellm']
)
Quick update: You can now call custom APIs within litellm (no need to spin up openai server) - https://docs.litellm.ai/docs/providers/custom_llm_server
@RussellLuo @dhruv-anand-aintech feel free to make a PR if you have an implementation that works well for you!
The Feature
Currently, I think the HuggingFace integration for embeddings relies on their free Inference API.
It'd be great to integrate with the SentencTransformers library and the 'feature-extraction' pipeline in the Transformers library to allow users to compute embeddings locally using the litellm.embedding() function.
Motivation, pitch
Same as above. Want to compute embeddings locally using the same interface as API providers.
Twitter / LinkedIn details
https://twitter.com/dhruv___anand