edoardottt / cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more
https://edoardoottavianelli.it
GNU General Public License v3.0
1.49k stars 152 forks source link

Depth Setting for More Comprehensive Crawling #139

Closed aloysius-tim closed 5 months ago

aloysius-tim commented 6 months ago

Hello,

I've been using Cariddi to crawl various websites and noticed that while it successfully extracts many links, it appears not to capture all possible links. This observation leads me to believe that Cariddi might be limiting the crawl depth or not providing an option to configure the depth of the crawl.

Issue/Feature Request Description:

Expected Behavior:

Actual Behavior:

Steps to Reproduce:

  1. Run Cariddi with the default settings on a target website.
  2. Notice the output and compare it with a manual check of the website's link depth.
  3. Observe that links beyond a certain depth are not included in Cariddi's output.

Possible Solution:

Thank you for developing Cariddi, and I look forward to any guidance or updates regarding this request.

Best regards, Tim

edoardottt commented 6 months ago

Hi Tim, thanks for your contribution. Can you provide me an input to test this scenario? I'm not sure the problem is the depth level of the links.

thanks, edo

aloysius-tim commented 6 months ago

Hi Edo, Here is one of the domain I've tried https://kretzrealestate.com/fr I'm getting only 58 links Thanks, Tim

edoardottt commented 6 months ago

Hi Tim, sorry for the late reply.

I've ran a test with the URL you supplied. I'm getting 58 links too, but in my opinion there aren't more links to crawl.

The last URL I get is https://kretzrealestate.com/fr/annonce/1460y/appartement/ (note that your last one could be different) and on that page there are just already crawled links or links belonging to different domains. Hence, crawling 58 links is the correct behavior here.

What were you expecting? Maybe you have something in mind I can't think of.

p.s.: I've tried using other crawlers too specifying the depth level (something incredibly high, like 1000) and I get the same ~60 links too.

edoardottt commented 5 months ago

Hi @aloysius-tim , is everything working fine?

If yes I would like to close this issue. Let me know, otherwise I'll close the issue as stale

aloysius-tim commented 5 months ago

Hi @edoardottt, Yes all good, thanks a lot for your help ! Cheers !