dodeeric / langchain-ai-assistant-with-hybrid-rag

See here https://github.com/dodeeric/ragai-agent the agentic (agent) version of this assistant.
https://github.com/dodeeric/ragai-agent
GNU General Public License v3.0
11 stars 3 forks source link

check filter/class for commons #109

Closed dodeeric closed 3 weeks ago

dodeeric commented 3 weeks ago

Present: mw-content-ltr mw-parser-output ==> Summary + Licensing ==> Alway? Old: hproduct commons-file-information-table

table fileinfotpl-type-information vevent mw-content-ltr ==> Summary ==> Only for information template!!! div notheme hproduct commons-file-information-table ==> Summary div mw-content-ltr mw-parser-output ==> Summary + Licensing

table fileinfotpl-type-artwork vevent mw-content-ltr ==> Summary ==> Only for artwork template!!! div hproduct commons-file-information-table ==> Summary div mw-content-ltr mw-parser-output ==> Summary + Licensing

dodeeric commented 3 weeks ago
FILTER1 = "fileinfotpl-type-information vevent mw-content-ltr"  # Summary: Information template (table class)
FILTER2 = "fileinfotpl-type-artwork vevent mw-content-ltr"      # Summary: Artwork template (table class)
#FILTER = "mw-content-ltr mw-parser-output"  # Old (Summary + Licensing)
#FILTER = "hproduct commons-file-information-table"

Tuple ==> item = scrape_web_page(url, (FILTER1, FILTER2))