Closed mpgreg closed 9 months ago
Also need code to recursively walk the docs page and extract sub-pages too. Need html splitter code to split on h2 heading.
Note:
@sunank200 @mpgreg AFAIK we generate html from rst docs since we are ingesting html docs why do we need rst too or I'm missing something here
Yes, this issue was meant to be closed if/when we change to html ingest.
Yes, this issue was meant to be closed if/when we change to html ingest.
cc: @sunank200 @phanikumv
Closing as discussed with Pankaj and Ankit in the sprint planning call.
extract_github_rst() does not follow includes or references to other rst docs. This means that much of the airflow docs content is not being ingested or is not able to reference to the correct page.
https://github.com/astronomer/ask-astro/blob/c45487c7f12a9424dbe885580c687e35e30b7de4/airflow/dags/ingestion/ask-astro-load-github.py#L46C10-L46C10
Need to ingest from scrape of airflow docs html pages instead.
https://airflow.apache.org/docs/