different-ai / embedbase

A dead-simple API to build LLM-powered apps
https://docs.embedbase.xyz
MIT License
498 stars 53 forks source link

[Python SDK]: async combining two dataset doesn't work after python SDK update #106

Closed ccomkhj closed 1 year ago

ccomkhj commented 1 year ago

System Info

embedbase-client: 0.1.2 ubuntu 22.04 Python 3.8.16

Reproduction

client = EmbedbaseAsyncClient(embedbase_url, embedbase_key)
if farm_id is None:
    # this works!
    [results] = await asyncio.gather(
        *[client.dataset(recipe_id).search(question, limit=7).get()]
    ) 
elif recipe_id is None:
    # this does not work!

    recipe_id = getRecipeId(farm_id)
    [results_recipe, results_farm] = await asyncio.gather(
        *[
            client.dataset(recipe_id).search(question, limit=6).get(),
            client.dataset(farm_id).search(question, limit=1).get(),
        ]
    )

Expected behavior

two dataset combined and return

louis030195 commented 1 year ago

I'm trying to reproduce, unit tests:

import asyncio
import os

import dotenv
import pytest
from embedbase_client import EmbedbaseAsyncClient, EmbedbaseClient

dotenv.load_dotenv("../../.env")
base_url = "https://api.embedbase.xyz"
api_key = os.environ.get("EMBEDBASE_API_KEY")
test_dataset = os.environ.get("EMBEDBASE_DATASET", "unit_test")
async_client = EmbedbaseAsyncClient(
    embedbase_url=base_url, embedbase_key=api_key, timeout=120
)

@pytest.mark.asyncio
async def test_merge_datasets():
    ds_one = f"{test_dataset}_organic_ingredients"
    ds_two = f"{test_dataset}_cake_recipes"
    await async_client.dataset(ds_one).clear()
    await async_client.dataset(ds_two).clear()
    await async_client.dataset(ds_one).batch_add(
        [
            {
                "data": "flour",
                "metadata": {"source": "organic.com", "path": "https://organic.com"},
            },
            {
                "data": "eggs",
                "metadata": {"source": "organic.com", "path": "https://organic.com"},
            },
            {
                "data": "milk",
                "metadata": {"source": "organic.com", "path": "https://organic.com"},
            },
        ]
    )
    await async_client.dataset(ds_two).batch_add(
        [
            {
                "data": "Cake recipe: 1. Mix flour, eggs and milk. 2. Bake for 30 minutes.",
                "metadata": {
                    "source": "recipe.com",
                    "path": "https://recipe.com",
                    "ingredients": ["flour", "eggs", "milk"],
                },
            }
        ]
    )

    question = "How to make a cake ?"
    [results_one, results_two] = await asyncio.gather(
        *[
            async_client.dataset(ds_one).search(question, limit=6).get(),
            async_client.dataset(ds_two).search(question, limit=1).get(),
        ]
    )
    assert len(results_one) == 3
    assert len(results_two) == 1

Is there any error?

In colab: https://colab.research.google.com/drive/1HTrFuCEDUBc8W1GBp320kJmnHRI3ri6J?usp=sharing

louis030195 commented 1 year ago

I think I know the issue. I added aiohttp for the async stream and didn't add it to main dependency so I guess your code crashes on import. (and unit tests didn't crash because aiohttp is a dev dependency :()

I just added it to 0.1.4 which is being released. Update: https://github.com/different-ai/embedbase/releases/tag/sdk-py-0.1.4

Really sorry about that (confirm me that's the error please)

benjaminshafii commented 1 year ago

closing this issues as it seems to be solved

@ccomkhj out to us if that's not the case :)