Open ethus3h opened 8 years ago
One use case for this would be if one knows a site is going to go down, and so it's important to get the site's links first, but the offsite links would also be valuable.
That would be useful but I have no idea how to do this. It may require a new wpull hook.
I really should learn how to program on grab-site and wpull so I can try to help make these things happen :3
I think this might be better addressed upstream, at least in the offsite-links case. My cursory look at the engine code (at least where it gets the urls) suggests that wpull's modified BFS uses a priority queue to implement that. This may make it so that an additional if-then clause could be added where the priority is computed to add some arbitrarily large number to the priority of off-site links.
Regardless, it may be useful to get the some input from @chfoo on this.
I believe this is related to https://github.com/chfoo/wpull/issues/297.
I suggest having an option to mark some URLs as low priority by regex, in a manner similar to ignores; such URLs would be downloaded after everything else.
Also an option to automatically mark all off-site URLs as low-priority, and an option to automatically mark all off-domain page requisites as low-priority?