danswer-ai / danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
https://docs.danswer.dev/
Other
10.28k stars 1.23k forks source link

Web connector not waiting for the page to be fully loaded #1346

Open maxime1992 opened 5 months ago

maxime1992 commented 5 months ago

I was trying to add a new connector to test Danswer (which looks great!).

I tried using this URL with the recursive strategy, but unfortunately it found only 1 page.

We can see while opening the URL I shared that the page takes a bit of time to load all the info. I'm not sure how you're doing the parsing but if you could wait for the network to stabilise before starting the parser it may work better?

cpwetteronline commented 3 months ago

As the web connector is using Playwright in the background, maybe an option would be to make the parameter wait_until of the goto method configurable? In such cases wait_until = 'networkidle' could be a valid option