Open nickrsan opened 7 years ago
nClimGrid subset is issue #116
ftp://ftp.ncdc.noaa.gov/pub/data/climgrid/
Local Climatological Data (LCD) subset is issue #117
ftp://ftp.ncdc.noaa.gov/pub/data/lcd/
Paleoclimatology subset is issue #17
ftp://ftp.ncdc.noaa.gov/pub/data/paleo/
We received a report that the https://www1.ncdc.noaa.gov/pub/ is 12 Tb. We grabbed 620 Gb, but don't know the source, one Nick Gregory, and why he would know the size. Can anyone vouch for that number?
I just did a full directory walk of that ncdc.noaa.gov/pub, and got 29.620 Tb, 1325435 files, and 11686 folders. Thanks for any help people intended.
We cannot do all of this. Is there some sense to dividing it up? Please advise to climate -at- mm -dot- st. Thanks!
I posted some subsets above. I agree it's best to break up what's left. I can claim some of them. Do you have sizes for top-level directories?
Thanks!
I can get these tomorrow. Tracking as Azimuth Backup Kickstarter Project Issue #77.
I am awaiting the /pub/data total but here, in the interim, is what I have. It's been running since mid-afternoon.
Note this is probably a lower bound. I received a number of 500 error codes during the run of the du against these directories, and, so, there were files whose sizes were missed. I will update when I have the final. The number above for /pub/data was another 30 Tb, but we'll see.
I can potentially grab some. What is left?
The FTP site remains in a "being copied" state. That said, it is not clear exactly where we are. We do have 3.9 Tb of it.
Ok, let me know if you need me to grab anything specific.
@mejackreed I think someone should make a run at Climate Mirror issue #42. No one as far as I know has even started it. We made a start, but its really incomplete, and the server does not always cooperate. I don't know if we are being throttled or what. I was/am trying:
wget -N -c --dns-timeout=10 --connect-timeout=300 --read-timeout=120 --wait=5 --mirror -e robots=off --random-wait --page-requisites --retry-connrefused --prefer-family=IPv4 --tries=40 --timestamping=on --recursive --level=8 --no-remove-listing --follow-ftp -nv --mirror --append-output=daac-ornl-gov-get-data.log --no-check-certificate https://daac.ornl.gov/
Sorry...I'm a newcomer here (been writing a book that is taking some time), but was just wondering if anyone is working on ftp://ftp.nodc.noaa.gov/pub/? thanks
I'm willing to grab whatever is needed to get a complete mirror if anyone has any idea where we stand on this....thanks
Name: NOAA Full NCDC Site Organization: NOAA NCDC Description URL: Download URL: https://www1.ncdc.noaa.gov/pub/data/ File Types: Size: Status: In progress - mirroring by non-GitHub user NCEI pub data mirrored by Azimuth Project: https://bitbucket.org/azimuth-backup/azimuth-inventory/issues/40/noaa-ncei-complete-pub-directory