edoardottt / cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more
https://edoardoottavianelli.it
GNU General Public License v3.0
1.49k stars 152 forks source link

crawl only unique urls #162

Closed tilakpatel22 closed 2 weeks ago

tilakpatel22 commented 2 weeks ago

Currently, when using Cariddi it is crawling lot of links again and again I think you should add an feature to crawl only new and unique webpages that will be really help users using Cariddi similar to hakrawler they have unique URL crawling feature

edoardottt commented 2 weeks ago

Hi! Can you provide please a test input? As far as I'm aware the option for disabling URL revisiting is disabled

tilakpatel22 commented 2 weeks ago

urls.txt cat urls.txt | cariddi -s check this file it is crawling single webpage multiple times

edoardottt commented 2 weeks ago

Just tried, there isn't any duplicate.

echo "https://igod.gov.in/" | cariddi -ot test
cat output-cariddi/test.results.txt | wc -l
1269
cat output-cariddi/test.results.txt | sort -u | wc -l
1269