JayZeeDesign / researcher-gpt

447 stars 253 forks source link

Websites arent scraped #4

Open mitscheluk opened 1 year ago

mitscheluk commented 1 year ago

so everything seems to work (via streamlit) however - browserless isnt called at all for me - meaning the scraper isnt scraping anything. I do get serps and based on those I get outputs.

mathematicsofpaul commented 1 year ago

scrape_website("Why did Elon rebrand twitter?", "https://www.axios.com/2023/08/03/twitters-x-rebrand-explained")

Gives:

Scraping website... CONTENTTTTTT: Just a moment...[www.axios.comChecking](https://file+.vscode-resource.vscode-cdn.net/Users/paul/Downloads/Chainlit-OpenAI-Functions-main/www.axios.comChecking) if the site connection is secure[www.axios.com](https://file+.vscode-resource.vscode-cdn.net/Users/paul/Downloads/Chainlit-OpenAI-Functions-main/www.axios.com) needs to review the security of your connection before proceeding.Connection is secureProceeding...Enable JavaScript and cookies to continueRay ID: 7f9b1f2eeb82cf6dPerformance & security by Cloudflare 'Just a moment...[www.axios.comChecking](https://file+.vscode-resource.vscode-cdn.net/Users/paul/Downloads/Chainlit-OpenAI-Functions-main/www.axios.comChecking) if the site connection is secure[www.axios.com](https://file+.vscode-resource.vscode-cdn.net/Users/paul/Downloads/Chainlit-OpenAI-Functions-main/www.axios.com) needs to review the security of your connection before proceeding.Connection is secureProceeding...Enable JavaScript and cookies to continueRay ID: 7f9b1f2eeb82cf6dPerformance & security by Cloudflare'

It's because browserless is not the best at bypassing Cloudflare anti browser protection.....

chrisb1005 commented 1 year ago

any work around here? im getting the same outcome