Closed ghost closed 5 months ago
grab-site supports multiple start URLs. Does it work properly if you do grab-site --no-offsite-links https://forum.com/ https://i.forum.com/
?
Thank you, it seems to work. I tested it with a single forum thread, it still grabs some offsite links, but much less than without --no-offsite-links and it does indeed include the attachments on the subdomain
The remaining "offset links" are probably just page requisites?
Hello, I'm trying to archive a Xenforo forum.
Let's say the site is forum.com and it has attachments which I want to download on a subdomain i.forum.com
Is there any way to only crawl from the domains forum.com and i.forum.com?
I tried just: grab-site 'https://forum.com/' , this downoads attachments from i.forum.com, but it also crawls all kinds of links from unrelated domains, considerably bloating the archive
Whereas, if I do: grab-site 'https://forum.com/' --no-offsite-links , it only crawls forum.com, also excluding i.forum.com
I tried adding wpull args: grab-site https://forum.com/ --wpull-args="--span-hosts --domains forum.com,i.forum.com"
This yields: grab-site: error: argument -H/--span-hosts: not allowed with argument --span-hosts-allow
Grab-site seems to use --span-hosts-allow by default without there being a way to disable it, which may make "--span-hosts --domains forum.com,i.forum.com" work.
Does anyone have a solution?