Lookyloo / PlaywrightCapture

Capture a URL with Playwright
Other
29 stars 3 forks source link

Emulate depth feature in scrapy #5

Closed Rafiot closed 1 year ago

Rafiot commented 1 year ago

Just a note: in order to do something similar as what scrappy allows when crawl a page at a specific depth, here is the default parser: https://github.com/scrapy/scrapy/blob/master/scrapy/linkextractors/lxmlhtml.py

This is extremely similar to what we do in har2tree: https://github.com/Lookyloo/har2tree/blob/main/har2tree/nodes.py#L425

Rafiot commented 1 year ago

Done: https://github.com/Lookyloo/PlaywrightCapture/commit/d1dcd16ce10a956e4cae6777ca5a564bec132868