Closed lorensr closed 3 years ago
Hi @lorensr,
You can chose to stop the scraper for either of the two patterns.
Using the stop_urls
, you can add any regex you'd like to use:
stop URLs with a trailing slash
"stop_urls": ["/$"]
or without
"stop_urls": [".*(?<!/)$"]
Hope this answers your question!
When I configure with
"start_urls": ["https://graphql.guide/preface"],
, the scraper picks up/preface
and/preface/
as separate pages, and includes them separately in search results. However, they are the same page. How can I de-duplicate them?