Open rafiqhasan opened 2 months ago
@grivescorbett is the creator of this notebook.
Possible improvement to be made to this notebook:
The Document AI Layout Parser can handle HTML pages. This could be a way to extract the paragraph/title/etc information without doing the manual HTML parsing.
File Name
/search/retrieval-augmented-generation/examples/rag_google_documentation.ipynb
What happened?
Needs to be fixed to handle cases when there is no H2 or devsite-article-body class / tag. Currently the code
for child in body_div.findChildren():
runs into error if no such tag is found in the URL source codeRelevant log output
CC: @holtskinner