climate-mirror / datasets

For tracking data mirroring progress
201 stars 18 forks source link

https://www1.ncdc.noaa.gov/pub/data/ua/ #284

Open JeremiahCurtis opened 7 years ago

JeremiahCurtis commented 7 years ago

there is a large tar.gz file (about 90 gb) in this directory that i would assume contains most NCDC upper air data. I'm attempting to grab this, but I have a slow connection. If anyone with a faster connection could grab this, it would be great. This is a pretty important dataset

JeremiahCurtis commented 7 years ago

download failed. it looks like ncdc only allows 2 connections per host, and I'm running several wgets from ncdc ftp directories already

jelimoore commented 7 years ago

Working on it. What wget were you using?

For reference I have 150+ down.

ivanstegic commented 7 years ago

I'm working on getting the file, and all the folders too.

wantonwonton commented 7 years ago

Hi, I'm new to this effort. I starting downloading the 90GB file a while back as well. This is just to my home machine. I don't have a setup for an online mirror, but can transfer this elsewhere later. I've currently got 8GB of the file. The download speed started at 4MB/s, but sometimes slows down to 2MB/s for a while. The ETA is showing around 7 hours. Are three downloads slowing things down? Should I continue? Do you guys have faster connections?

jelimoore commented 7 years ago

150ish down here. Newb to this whole command-line wget thing, what should I use to download this dataset? wget -r https://www1.ncdc.noaa.gov/pub/data/ua/ just got index.html

I'd like to clone the whole folder. ETA is 7hr20m for me.

wantonwonton commented 7 years ago

There's some (limited) info on wget commands here. My ETA's about 6hr20m now.

JeremiahCurtis commented 7 years ago

thanks....I have much slower connections; not sure what exactly is wrong with my broadband service, as I should be downloading about 20 times faster than is currently the case. I would give anything to have 2-4 MB/s right now, although my ISP says I should have that speed I closed my download of the zip file so that might help speed up other grabs

JeremiahCurtis commented 7 years ago

I would like to grab any smaller folders from ftp://ftp.ncdc.noaa.gov/pub/data/ If anyone knows where the overall progress on this directory lies, that would be a huge help

ivanstegic commented 7 years ago

@JeremiahCurtis I don't think it's you, I think the servers are throttling us and we're being capped somewhere. Urgh.

wantonwonton commented 7 years ago

I lost connectivity for a while and had to power-cycle my cable modem to recover. I resumed downloading with wget -c (which picked up where it left off (24GB), and still getting up to 4MB/s at the moment).

@JeremiahCurtis You might consider power-cycling your router/modem (although you'd have to restart your downloads afterward, which would hopefully be resumable).

jelimoore commented 7 years ago

Picked all 90gb of it up last night. ls -lackhs shows:

94119055 -rw-r--r-- 1 root wheel 90G Jan 28 05:34 rrs-data.tar.gz 10092 -rw-r--r-- 1 root wheel 67M Jan 28 05:34 wget-log

ghost commented 7 years ago

It's automatic for these servers to throttle at a limited number of connections per IP, and a limited number of reconnects per second. You can manage this to some degree on wget using --wait= and --random- wait and for httrack via the --max-rate= and %cN options. I do not know lftp enough to know what to tell it. WinSCP hasn't an explicit delay setting, as far as I know, but the effect can be achieved by using a synchronize and insisting that a checksum be used to tell if a file is new: There is a delay to do the checksum.

I can't speak for FileZilla at all.

On Sat, Jan 28, 2017, at 00:52, Ivan Stegic wrote:

@JeremiahCurtis[1] I don't think it's you, I think the servers are throttling us and we're being capped somewhere. Urgh. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub[2], or mute the thread[3].

Links:

  1. https://github.com/JeremiahCurtis
  2. https://github.com/climate-mirror/datasets/issues/284#issuecomment-275829523
  3. https://github.com/notifications/unsubscribe-auth/AD3HB5sr7Sy6hWfSYLLUKghi-CQvSBiBks5rWteTgaJpZM4LwOel
wantonwonton commented 7 years ago

I also got the full large file (which will just be offline for now):

-rw-rw-r-- 1 root root 96453271008 Mar 30 2016 rrs-data.tar.gz

Here's the md5sum output:

006e568c46b75f3dd53f57744046d423 rrs-data.tar.gz

ivanstegic commented 7 years ago

I was also able to get the file. Checking the MD5 now.

ivanstegic commented 7 years ago

@wantonwonton I got the same MD5 as you did: 006e568c46b75f3dd53f57744046d423

jelimoore commented 7 years ago

md5:

MD5 (rrs-data.tar.gz) = 006e568c46b75f3dd53f57744046d423