File "/usr/local/airflow/include/tasks/extract/blogs.py", line 56, in <lambda>
lambda x: BeautifulSoup(x, "lxml").find(class_="post-card__meta").find(class_="title").get_text()
AttributeError: 'NoneType' object has no attribute 'find'
Astro Blogs formatting has changed
Astro Docs ingest DAG
Have been using outdated url doc.astronomer.io, but astronomer has moved to www.astronomer.io/docs
Minor Improvements
Remove ingest of Github issues from ingest sources
This has been adding nothing but noise. Most closed issues are bug reports and they have been fixed, retrieving these cause the LLM to think the bug persists
Github Registry Docs Reformat
What Ask Astro had for registry ingest previously does not provide LLM on any insights at all
How does the LLM know how to use this anyway?
Add operator usage and param type details
e.g. of what we had before
# Registry
## Provider: astro-sdk-python
Version: 1.8.0
Module: dataframe
Module Description: This decorator will allow users to write python functions while treating SQL
tables as dataframes.
Upgrade from Cohere Rerank 2 to Rerank 3
Cohere emailed us asking us if we can move to Rerank 3. It's cheaper better and faster.
Upgrade from GPT-4 Turbo to GPT-4o
System Prompt Changes
Better LLM filter as last step to get rid of unhelpful documents
Ask to not include URLs that do not explicitly appear in the context
Ask LLM to explicit cite sources whenever possible. Overriding LLM stuffing template and function in LangChain to allow DocLink and Document # passed into LLM.
Bug Fixes
DAG: ask_astro_load_astro_cli_docs failure
DAG: ask_astro_load_stackoverflow failure
DAG: ask_astro_load_blogs failure
Astro Blogs formatting has changed
Astro Docs ingest DAG Have been using outdated url doc.astronomer.io, but astronomer has moved to www.astronomer.io/docs
Minor Improvements