Open JeremiahCurtis opened 7 years ago
download failed. it looks like ncdc only allows 2 connections per host, and I'm running several wgets from ncdc ftp directories already
Working on it. What wget were you using?
For reference I have 150+ down.
I'm working on getting the file, and all the folders too.
Hi, I'm new to this effort. I starting downloading the 90GB file a while back as well. This is just to my home machine. I don't have a setup for an online mirror, but can transfer this elsewhere later. I've currently got 8GB of the file. The download speed started at 4MB/s, but sometimes slows down to 2MB/s for a while. The ETA is showing around 7 hours. Are three downloads slowing things down? Should I continue? Do you guys have faster connections?
150ish down here. Newb to this whole command-line wget thing, what should I use to download this dataset? wget -r https://www1.ncdc.noaa.gov/pub/data/ua/ just got index.html
I'd like to clone the whole folder. ETA is 7hr20m for me.
There's some (limited) info on wget commands here. My ETA's about 6hr20m now.
thanks....I have much slower connections; not sure what exactly is wrong with my broadband service, as I should be downloading about 20 times faster than is currently the case. I would give anything to have 2-4 MB/s right now, although my ISP says I should have that speed I closed my download of the zip file so that might help speed up other grabs
I would like to grab any smaller folders from ftp://ftp.ncdc.noaa.gov/pub/data/ If anyone knows where the overall progress on this directory lies, that would be a huge help
@JeremiahCurtis I don't think it's you, I think the servers are throttling us and we're being capped somewhere. Urgh.
I lost connectivity for a while and had to power-cycle my cable modem to recover. I resumed downloading with wget -c (which picked up where it left off (24GB), and still getting up to 4MB/s at the moment).
@JeremiahCurtis You might consider power-cycling your router/modem (although you'd have to restart your downloads afterward, which would hopefully be resumable).
Picked all 90gb of it up last night. ls -lackhs
shows:
94119055 -rw-r--r-- 1 root wheel 90G Jan 28 05:34 rrs-data.tar.gz 10092 -rw-r--r-- 1 root wheel 67M Jan 28 05:34 wget-log
It's automatic for these servers to throttle at a limited number of connections per IP, and a limited number of reconnects per second. You can manage this to some degree on wget using --wait= and --random- wait and for httrack via the --max-rate= and %cN options. I do not know lftp enough to know what to tell it. WinSCP hasn't an explicit delay setting, as far as I know, but the effect can be achieved by using a synchronize and insisting that a checksum be used to tell if a file is new: There is a delay to do the checksum.
I can't speak for FileZilla at all.
On Sat, Jan 28, 2017, at 00:52, Ivan Stegic wrote:
@JeremiahCurtis[1] I don't think it's you, I think the servers are throttling us and we're being capped somewhere. Urgh. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub[2], or mute the thread[3].
Links:
I also got the full large file (which will just be offline for now):
-rw-rw-r-- 1 root root 96453271008 Mar 30 2016 rrs-data.tar.gz
Here's the md5sum output:
006e568c46b75f3dd53f57744046d423 rrs-data.tar.gz
I was also able to get the file. Checking the MD5 now.
@wantonwonton I got the same MD5 as you did: 006e568c46b75f3dd53f57744046d423
md5:
MD5 (rrs-data.tar.gz) = 006e568c46b75f3dd53f57744046d423
there is a large tar.gz file (about 90 gb) in this directory that i would assume contains most NCDC upper air data. I'm attempting to grab this, but I have a slow connection. If anyone with a faster connection could grab this, it would be great. This is a pretty important dataset