Closed ccomkhj closed 1 year ago
I'm trying to reproduce, unit tests:
import asyncio
import os
import dotenv
import pytest
from embedbase_client import EmbedbaseAsyncClient, EmbedbaseClient
dotenv.load_dotenv("../../.env")
base_url = "https://api.embedbase.xyz"
api_key = os.environ.get("EMBEDBASE_API_KEY")
test_dataset = os.environ.get("EMBEDBASE_DATASET", "unit_test")
async_client = EmbedbaseAsyncClient(
embedbase_url=base_url, embedbase_key=api_key, timeout=120
)
@pytest.mark.asyncio
async def test_merge_datasets():
ds_one = f"{test_dataset}_organic_ingredients"
ds_two = f"{test_dataset}_cake_recipes"
await async_client.dataset(ds_one).clear()
await async_client.dataset(ds_two).clear()
await async_client.dataset(ds_one).batch_add(
[
{
"data": "flour",
"metadata": {"source": "organic.com", "path": "https://organic.com"},
},
{
"data": "eggs",
"metadata": {"source": "organic.com", "path": "https://organic.com"},
},
{
"data": "milk",
"metadata": {"source": "organic.com", "path": "https://organic.com"},
},
]
)
await async_client.dataset(ds_two).batch_add(
[
{
"data": "Cake recipe: 1. Mix flour, eggs and milk. 2. Bake for 30 minutes.",
"metadata": {
"source": "recipe.com",
"path": "https://recipe.com",
"ingredients": ["flour", "eggs", "milk"],
},
}
]
)
question = "How to make a cake ?"
[results_one, results_two] = await asyncio.gather(
*[
async_client.dataset(ds_one).search(question, limit=6).get(),
async_client.dataset(ds_two).search(question, limit=1).get(),
]
)
assert len(results_one) == 3
assert len(results_two) == 1
Is there any error?
In colab: https://colab.research.google.com/drive/1HTrFuCEDUBc8W1GBp320kJmnHRI3ri6J?usp=sharing
I think I know the issue. I added aiohttp
for the async stream and didn't add it to main dependency so I guess your code crashes on import. (and unit tests didn't crash because aiohttp is a dev dependency :()
I just added it to 0.1.4 which is being released. Update: https://github.com/different-ai/embedbase/releases/tag/sdk-py-0.1.4
Really sorry about that (confirm me that's the error please)
closing this issues as it seems to be solved
@ccomkhj out to us if that's not the case :)
System Info
embedbase-client: 0.1.2 ubuntu 22.04 Python 3.8.16
Reproduction
Expected behavior
two dataset combined and return