astronomer / ask-astro

An end-to-end LLM reference implementation providing a Q&A interface for Airflow and Astronomer
https://ask.astronomer.io/
Apache License 2.0
192 stars 47 forks source link

Ask Astro || Issues after hybrid search || Getting different responses and sources for same question when asked multiple times #168

Closed vatsrahul1001 closed 8 months ago

vatsrahul1001 commented 10 months ago

While testing we noticed for the same question we were getting opposite references and responses when asked multiple times

Incorrect Slack thread

Correct Slack thread

Noticed this with multiple times while testing today

mpgreg commented 10 months ago

I checked the backend and direct query to weaviate with hybrid consistently retrieves the same results

from airflow.providers.weaviate.hooks.weaviate import WeaviateHook

_WEAVIATE_CONN_ID = "weaviate_prod"
WEAVIATE_CLASS= "DocsDev"

weaviate_client = WeaviateHook(_WEAVIATE_CONN_ID).get_client()

question = "Can I use Astro CLI to download DAGS from astronomer registry?"

def get_hybrid(question) -> set:
        links = weaviate_client.query.get(WEAVIATE_CLASS, ["docLink"])\
                            .with_limit(5)\
                            .with_additional(["certainty","id"])\
                            .with_hybrid(
                                       query=question, 
                                       # alpha=0.4
                             )\
                            .do()['data']['Get'][WEAVIATE_CLASS]
        return {chunk['docLink'] for chunk in links}

links = get_hybrid(question)

for i in range(10):
    new_links = get_hybrid(question)
    assert links == new_links
mpgreg commented 10 months ago

I am wondering if multiqueryretriever is generating different questions each time resulting in different results retrieved.

davidgxue commented 8 months ago

This issue is related to the first attempt for hybrid search that wasn't working correctly. This issue is resolved and no longer relevant after successful correct implementation of hybrid search and reranker.