Closed ghost closed 5 years ago
What I interpret this as (and @Sian1468, please correct me if wrong) is an option to only follow links if the links are in the same domain.
Normally I would think this is covered by domain restriction but I think what is being suggested is to ignore the "hops" restriction and keep following any links that are in the same domain. This is somewhat similar to "archive the whole site" provided the entire site is inter-linked from the starting seed.
What I interpret this as (and @Sian1468, please correct me if wrong) is an option to only follow links if the links are in the same domain.
Normally I would think this is covered by domain restriction but I think what is being suggested is to ignore the "hops" restriction and keep following any links that are in the same domain. This is somewhat similar to "archive the whole site" provided the entire site is inter-linked from the starting seed.
You correct @machawk1
I think Squidwarc can do more than capture from depth setting by capture whole site with single/by depth page's offsite links or without offsite links setting.
@Sian1468 thanks you for suggesting this and I believe your suggestion would be an excellent feature for Squidwarc.
I will be putting some thought into how to accomplish this nicely alongside the existing crawl modes. Do you have any suggestions as to how you would like to be able to specify this crawl mode?
@Sian1468 thanks you for suggesting this and I believe your suggestion would be an excellent feature for Squidwarc.
I will be putting some thought into how to accomplish this nicely alongside the existing crawl modes. Do you have any suggestions as to how you would like to be able to specify this crawl mode?
Recursive crawl mode
I got inspiration from other archiving tools & software eg. Wpulll, grab-site and crocoite
implemented and merged into master PR #47
Are you submitting a bug report or a feature request?
Feature request
What is the current behavior?
Squidwarc can't config to following links if url is same domain as seeds.
What is the expected behavior?
Can config to following links if url is same domain as seeds.