astronomer / ask-astro

An end-to-end LLM reference implementation providing a Q&A interface for Airflow and Astronomer
https://ask.astronomer.io/
Apache License 2.0
196 stars 47 forks source link

Start more formal tracking of metrics (counts by success, tests, in Snowflake) #196

Closed sunank200 closed 10 months ago

sunank200 commented 11 months ago

Feature Description: We are currently tracking various metrics through Langsmith, but we plan to transition away from Langsmith in the future. To facilitate this shift, we need to start ingesting all metrics currently tracked by Langsmith into Snowflake. This will allow us to maintain and analyze our data more effectively. The goal is to establish a more formal and centralized system for tracking metrics, including counts of successes and tests.

Proposed Solution:

  1. Identify Metrics: Catalog all metrics currently being tracked in Langsmith.
  2. Data Migration Strategy: Develop a strategy for migrating these metrics from Langsmith to Snowflake, ensuring data integrity and continuity.
  3. Snowflake Integration: Set up the necessary infrastructure in Snowflake to receive, store, and process these metrics.
  4. Testing and Validation: Ensure that the data ingestion process is accurate and reliable through thorough testing.
  5. Documentation: Provide comprehensive documentation and training for team members on how to access and utilize these metrics in Snowflake.
  6. Timeline and Rollout Plan: Establish a clear timeline for the transition, including milestones and a final date for discontinuing the use of Langsmith for metric tracking.
sunank200 commented 11 months ago

@Lee-W you can start the discussion on this on the ask-astro-dev channel. Input from Michael, Steven and Julian is useful. The goal is to scope this task and implement that.

shillion commented 11 months ago

Let's just start with number of traces (i.e. root runs = user requests) broken down by success/failure, and the Avg. correctness score, by day. We don't need historical data. Totally fine to just start collecting this now. I'd like to keep LangSmith around for troubleshooting.

sunank200 commented 11 months ago

@Lee-W please contact Steven for where those metrics should land in snowflake.

phanikumv commented 10 months ago

Discussed with the data team and concluded that we will create a separate DB, waiting on IT to create that

Lee-W commented 10 months ago

https://astronomer.slack.com/archives/C05QJA9LTR9/p1703127879156309

phanikumv commented 10 months ago

@Lee-W to follow up with Josh Fell and make progress on this

Lee-W commented 10 months ago

Thanks to Josh. We already have snowflake DB created but will still need IT's help creating snowflake account

Lee-W commented 10 months ago

As informed earlier today, we'll need to rewrite it into airflow DAG.