Open nickrsan opened 7 years ago
"DNS address not found"
I updated his link with an HTTP link.
If someone is able to do this here is a sample wget command for http crawls:
wget --mirror --warc-file=usgodae.org/ftp/outgoing/.warc --warc-cdx
--page-requisites --html-extension --convert-links
--execute robots=off --directory-prefix=. --span-hosts
--domains=usgodae.org
--user-agent='Mozilla (mailto:you@example.com)'
--wait=10 --random-wait http://usgodae.org/ftp/outgoing/
@gabefair Thanks, I didn't wanna get anyone hopes up, but I've been working on this mirror since yesterday, but with my wget syntax, I missed a lot of data in certain folders. It's been downloading all night.
Once my initial scan is done, I'll run your command and see what I can grab.
Status: 45.1 GB of / Unknown
CAREFUL on going after military sites!
They do not automatically have the same rights if public access which civilian sites do.
-Jan
On Thu, Jan 26, 2017, at 08:38, Gabriel Fair wrote:
If someone is able to do this here is a sample wget command for http crawls: wget --mirror --warc-file=usgodae.org/ftp/outgoing/.warc --warc-cdx
--page-requisites --html-extension --convert-links
--execute robots=off --directory-prefix=. --span-hosts
--domains=usgodae.org
--user-agent='Mozilla (mailto:you@example.com)'
--wait=10 --random-wait http://usgodae.org/ftp/outgoing/
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub[1], or mute the thread[2].
Links:
http://usgodae.org/ftp/outgoing/
Suggested in a large email containing many urls