climate-mirror / datasets

For tracking data mirroring progress
201 stars 18 forks source link

Dataset at ftp:/eclipse.ncdc.noaa.gov/pub/cdr #159

Open nickrsan opened 7 years ago

nickrsan commented 7 years ago

ftp:/eclipse.ncdc.noaa.gov/pub/cdr.

Suggested in a large email containing many urls

gabefair commented 7 years ago

Downloading now

donbright commented 7 years ago

Rough size estimate: 6.7T using lftp This is a tough one to estimate because the directory is full of symlinks, some of them point within subtrees that others are a root for. However the big parts appear to be 'gridsat' 3.4T and 'avhrr-land' 2.2T


update Size is 18T per link to @empirical-bayesian Azimuth issue: https://bitbucket.org/azimuth-backup/azimuth-inventory/issues/10/ quoting Benjamin Rose

ghost commented 7 years ago

The Azimuth Backup Project has been working on getting ftp://eclipse.ncdc.noaa.gov this since 20 December 2016. So far we have 5.6 Tb. We have the space, but the data can only be transferred from the server so fast.

wantonwonton commented 7 years ago

@empirical-bayesian, Do you know if the data transfers are being rate-limited on a per-client IP address basis? If so, then if the data sets can be split up and downloaded by more independent clients (if you don't have many already), it could speed things up. (If there are very large single files, I think one might be able to split those up too among multiple clients by using the resumable data transfer feature to initiate a transfer from the middle of a file).

ghost commented 7 years ago

I do not have any administrative knowledge of these sites to declare how widespread that practice is.

However, it is standard practice to do this. Moreover, my work with The Azimuth Backup Project has encountered several instances where, when I was careless, I found myself throttled and in one instance banned from a site, by IP. I had to move the gather to a different server to be able to continue.

A colleague had a similar experience.

On Sat, Jan 28, 2017, at 15:31, wantonwonton wrote:

@empirical-bayesian[1], Do you know if the data transfers are being rate-limited on a per-client IP address basis? If so, then if the data sets can be split up and downloaded by more independent clients (if you don't have many already), it could speed things up. (If there are very large single files, I think one might be able to split those up too among multiple clients by using the resumable data transfer feature to initiate a transfer from the middle of a file). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub[2], or mute the thread[3].

Links:

  1. https://github.com/empirical-bayesian
  2. https://github.com/climate-mirror/datasets/issues/159#issuecomment-275872711
  3. https://github.com/notifications/unsubscribe-auth/AD3HB33S_g8OpcTXLGAKsxaMzkR34u_Aks5rW6W3gaJpZM4LtKx0