Ask Astro is an open-source reference implementation of Andreessen Horowitz's LLM Application Architecture built by Astronomer. It provides an end-to-end example of a Q&A LLM application used to answer questions about Apache Airflow® and Astronomer, including:
These are generally divided into three categories: data retrieval & embedding, prompt orchestration, and feedback loops. The rest of this README contains more in-depth detail on each of the categories in advance of a series of blog posts that will be written about the topics.
If you have any questions, feedback, or want to share similar use cases, reach out to ai@astronomer.io.
In order to make the responses as factual and accurate as possible, it's generally best practice to use Retrieval Augmented Generation (RAG). However, in order for RAG to be effective, a vector database needs to be populated with the most up-to-date and relevant information.
Ask Astro uses a set of Airflow DAGs that: ingest data from a source via an API or Python library, preprocesses and splits the data into smaller chunks, embeds those chunks, and writes the embeddings to Weaviate. As of today, Ask Astro retrieves data from the following sources:
Generally, each of these sources has a DAG that handles the ingestion flow. We use LangChain's built-in text splitters for processing Markdown, RST, and Python code into smaller chunks to ensure each document is small enough to give accurate results when doing embeddings. We then use a Weaviate provider that we've built (and plan to publish) to both embed and store each document as a vector in Weaviate using OpenAI's embedding model.
In addition to the individual DAGs per source, we have one DAG to do full-database refreshes based on a baseline of all data. The first time the ask-astro-load-bulk
DAG runs it saves extracted documents in parquet files for a point-in-time baseline. This baseline allows us to experiment with new vector databases, embedding models, chunking strategies, etc. much more quickly.
See the Ingest README for details on configuring ingest with sources and connection details.
Ask Astro uses LangChain's ConversationalRetrievalChain
to generate a response. This chain does the following:
gpt-3.5-turbo
) to check relevancy of each of the 8 documents.This generally works well. For prompt rewording, we use gpt-3.5-turbo
, which runs very quickly and inexpensively. For the actual user-facing answer generation, we use gpt-4o
to ensure high quality answers.
Airflow is critical in improving model performance over time. Feedback on answers come from two places:
If a user provides feedback that the answer is correct, and the LLM rates the answer as helpful, public, and on-topic, Ask Astro (1) marks the answer as a good example to be displayed on the Ask Astro homepage for users to derive inspiration from, and (2) writes the question and answer back to Weaviate as a potential source to be used in future prompts. This way, there's a continuous feedback loop to constantly improve the model over time.
A project like Ask Astro is never "complete", and there are always more methods and use cases to explore. Further exploration in Ask Astro (and more generally, in using Airflow for LLM operations) may come from the following areas:
You can use local dev script to start UI, API server and API
python3 scripts/local_dev.py run-api-server # To run backend
python3 scripts/local_dev.py run-ui # To run UI
python3 scripts/local_dev.py run-airflow # To run Airflow
Backend API sever
User Interface
Apache Airflow®
The following sections describe how to deploy the various components of Ask Astro.