Closed tilakpatel22 closed 2 weeks ago
Hi! Can you provide please a test input? As far as I'm aware the option for disabling URL revisiting is disabled
urls.txt cat urls.txt | cariddi -s check this file it is crawling single webpage multiple times
Just tried, there isn't any duplicate.
echo "https://igod.gov.in/" | cariddi -ot test
cat output-cariddi/test.results.txt | wc -l
1269
cat output-cariddi/test.results.txt | sort -u | wc -l
1269
Currently, when using Cariddi it is crawling lot of links again and again I think you should add an feature to crawl only new and unique webpages that will be really help users using Cariddi similar to hakrawler they have unique URL crawling feature