aws-samples / bedrock-claude-chat

AWS-native chatbot using Bedrock + Claude (+Mistral)
MIT No Attribution
693 stars 237 forks source link

[Feature Request] Extend Playwright Timeout to Prevent Synchronization Errors #306

Closed ksk-17090k1 closed 1 month ago

ksk-17090k1 commented 1 month ago

Describe the solution you'd like

I'd like to be able to register websites that take a long time to load as knowledge.

Why the solution needed

If you register a URL that takes a long time to load (for example, https://www.value-domain.com/media/sidejob-home/) on the bot settings screen, a synchronization error occurs due to a timeout in Playwright.

Implementation feasibility

The default timeout value for Playwright Page is 30000 milliseconds. To avoid timeout errors, you can consider extending this period.

[backend/embedding/loaders/playwright.py]

response = page.goto(url)
page.set_default_timeout(300000)

Are you willing to discuss the solution with us, decide on the approach, and assist with the implementation?

statefb commented 1 month ago

I believe that almost all of websites can be rendered within 30 sec (blog), and 5 min (300 sec) is too long for the default value. Do you have any statistics of the duration for the website you want to handle? It helps to determine the appropriate default time out value.

ksk-17090k1 commented 1 month ago

thank you for your feedback! As you mentioned, sites that take more than 30 seconds to load are likely having some problems. If you expect an error when setting such a site in the bot's configuration, it seems there is no need for further modifications.

statefb commented 1 month ago

If there are no downsides, it may also be worth considering extending the default timeout as well. Generally, increasing the timeout value can also raise concerns about performance, so it's best to avoid setting unnecessarily long values. If you have any statistical data on this, I would be interested in seeing it. Or if you already have your own forked project, maybe we can close this issue.

ksk-17090k1 commented 1 month ago

Unfortunately, I couldn't find any statistical data that would help determine how long to extend the timeout value. Your concerns about performance are reasonable, so I agree that we should handle the timeout value in a forked repository. I think it's fine to close this issue :)