climate-mirror / datasets

For tracking data mirroring progress
201 stars 18 forks source link

U.S. Navy Naval Research Facility Archive Data #222

Open nickrsan opened 7 years ago

nickrsan commented 7 years ago

http://usgodae.org/ftp/outgoing/

Suggested in a large email containing many urls

as-com commented 7 years ago

"DNS address not found"

siennathesane commented 7 years ago

I updated his link with an HTTP link.

gabefair commented 7 years ago

If someone is able to do this here is a sample wget command for http crawls: wget --mirror --warc-file=usgodae.org/ftp/outgoing/.warc --warc-cdx --page-requisites --html-extension --convert-links --execute robots=off --directory-prefix=. --span-hosts --domains=usgodae.org
--user-agent='Mozilla (mailto:you@example.com)' --wait=10 --random-wait http://usgodae.org/ftp/outgoing/

Colt8027 commented 7 years ago

@gabefair Thanks, I didn't wanna get anyone hopes up, but I've been working on this mirror since yesterday, but with my wget syntax, I missed a lot of data in certain folders. It's been downloading all night.

Once my initial scan is done, I'll run your command and see what I can grab.

Status: 45.1 GB of / Unknown

ghost commented 7 years ago

CAREFUL on going after military sites!

They do not automatically have the same rights if public access which civilian sites do.

-Jan

On Thu, Jan 26, 2017, at 08:38, Gabriel Fair wrote:

If someone is able to do this here is a sample wget command for http crawls: wget --mirror --warc-file=usgodae.org/ftp/outgoing/.warc --warc-cdx

--page-requisites --html-extension --convert-links

--execute robots=off --directory-prefix=. --span-hosts

--domains=usgodae.org

--user-agent='Mozilla (mailto:you@example.com)'

--wait=10 --random-wait http://usgodae.org/ftp/outgoing/

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub[1], or mute the thread[2].

Links:

  1. https://github.com/climate-mirror/datasets/issues/222#issuecomment-275389885
  2. https://github.com/notifications/unsubscribe-auth/AD3HB-spAltPl64Ohja4u-PRbIRt38-Fks5rWKHhgaJpZM4LtK-F