The datasette-embeddings extension currently requires the use of hosted OpenAI models and the availability of an OpenAI API key to generate embeddings:
async def calculate_embedding(cls, api_key, text, model):
# Add dimensions for models called things that end in -xxx digits
body = {
"input": text,
"model": model,
}
last_bit = model.split("-")[-1]
if last_bit.isdigit():
body["model"] = "-".join(model.split("-")[:-1])
body["dimensions"] = int(last_bit)
async with httpx.AsyncClient() as client:
response = await client.post(
"https://api.openai.com/v1/embeddings",
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}",
},
json=body,
)
response.raise_for_status()
embedding = response.json()["data"][0]["embedding"]
return embedding
It would be useful to allow the user to specify a local model without the need for an API key.
This could be done minimally, or build on the llm package, which has support for generating embeddings (docs) from local models via the llm-sentence-transformers extension.
For datasette-lite, it would also be useful to be able to make use of browser machinery to use a wasm packaged model to generate the embeddings. The anywidget framework provides a way of wrapping js/wasm packages so that they can be called from python code running in UIs running in VSCode and browser-based environments (jupyter, marimo) which might be a sensible way of integrating wasm powered function calls into that datasette-lite environment.
The
datasette-embeddings
extension currently requires the use of hosted OpenAI models and the availability of an OpenAI API key to generate embeddings:It would be useful to allow the user to specify a local model without the need for an API key.
This could be done minimally, or build on the
llm
package, which has support for generating embeddings (docs) from local models via thellm-sentence-transformers
extension.For
datasette-lite
, it would also be useful to be able to make use of browser machinery to use a wasm packaged model to generate the embeddings. Theanywidget
framework provides a way of wrapping js/wasm packages so that they can be called from python code running in UIs running in VSCode and browser-based environments (jupyter, marimo) which might be a sensible way of integrating wasm powered function calls into thatdatasette-lite
environment.