danswer-ai / danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
https://docs.danswer.dev/
Other
10.38k stars 1.25k forks source link

Support for compressed sitemap, sitemap index, and robots.txt #1538

Closed zotya closed 2 months ago

zotya commented 4 months ago

Upgraded the sitemap variant of the web connector to use ultimate_sitemap_parser Benefits:

Also, we take in account the robots.txt, using urllib.robotparser

If you prefer, you can still specify the exact location of the sitemap, and the connector will use that sitemap.

vercel[bot] commented 4 months ago

@zotya is attempting to deploy a commit to the Danswer Team on Vercel.

A member of the Team first needs to authorize it.

zotya commented 2 months ago

Ah, could you rebase the changes please, ty!

Sorry for the delay, I did the rebase now.

yuhongsun96 commented 2 months ago

Awesome thanks!