cds-snc / scan-websites

On-demand scanning of websites for accessibility and security vulnerabilities/compliance / Analyse à la demande des sites Web pour les vulnérabilités/conformité en matière d'accessibilité et de sécurité
https://scan-websites.alpha.canada.ca
MIT License
12 stars 1 forks source link

Write crawler using scrapy and playwright #26

Closed maxneuvians closed 3 years ago

maxneuvians commented 3 years ago

Ideally we would use python's scrapy library to do URL crawling as it is much more mature than most of the JS alternatives out there. Scrapy works great with static HTML sites, but has problems with single page javascript apps. One solution is to use a plugin between scrapy and playwright. Relevant links

https://github.com/scrapy/scrapy https://github.com/scrapy-plugins/scrapy-playwright https://github.com/microsoft/playwright-python

Other solution approaches are welcome

maxneuvians commented 3 years ago

Closed in #48