COS301-SE-2024 / Web-Exploration-Engine

Tech Odyssey - The Web Exploration Engine (WEE) - WEE automates the extraction of critical website information. Employing cutting-edge scraping technologies and natural language processing, WEE delivers user-friendly insights and reports through an intuitive and responsive website
https://capstone-wee.dns.net.za/
11 stars 3 forks source link

Fix issue with robot error response #168

Closed Johane-B closed 3 months ago

Johane-B commented 3 months ago

image if we can't scrape the root, but there is a robot.txt- we don't send any error response. This makes it seem like we are allowed to scrape it, even though we aren't image image in the image below- see that ChatGPT url is included in nr of urls we are allowed ro crawl image