astronomer / ask-astro

An end-to-end LLM reference implementation providing a Q&A interface for Airflow and Astronomer
https://ask.astronomer.io/
Apache License 2.0
192 stars 47 forks source link

Create test plan for Ask-Astro phase 2 #142

Closed sunank200 closed 10 months ago

sunank200 commented 10 months ago
  • Is there good data in the database? —> Use Weaviate client directly or streamlit app
  • Is the frontend retrieving it properly? → The current approach is fine where we as questions to slack directly

Scenarios

  • both are positive
  • both negative → problem with documents we are embedding or ingestion —> log issue for this
  • DB is good but not front end → problem with conversation retrival ??????? —> log issue for this
  • Frontend is good but DB is bad → log a bug

70% positive - good enough for us to go ahead.

  • % for postive response
  • % with postive result in top 3 document sources
  • Average of both

More relevant questions from langsmith - with correctness check (bad and good answers)

sunank200 commented 10 months ago

@vatsrahul1001 has created this notion doc with test plan. @vatsrahul1001 please add more details as required

mpgreg commented 10 months ago

Test pipeline for this is at https://github.com/mpgreg/ask-astro/blob/add_baseline_test/airflow/dags/monitor/test_baseline.py with test logic at https://github.com/mpgreg/ask-astro/blob/add_baseline_test/airflow/include/tasks/utils/retrieval_tests.py

vatsrahul1001 commented 10 months ago

Latest Test results : https://docs.google.com/spreadsheets/d/13cVqNikix82YjCPA4t0XaULg3XccBnvrQUmQa9VwgC0/edit#gid=1200545478

sunank200 commented 10 months ago

This can be closed right @vatsrahul1001 ?

vatsrahul1001 commented 10 months ago

summary Total Test Cases: 52 Passed Test Cases: 45 Failed Test Cases: 7

More context