climate-mirror / datasets

For tracking data mirroring progress
201 stars 18 forks source link

Dataset at ftp://cddis.gsfc.nasa.gov/slr/cpf_predicts #149

Open nickrsan opened 7 years ago

nickrsan commented 7 years ago

ftp://cddis.gsfc.nasa.gov/slr/cpf_predicts.

Suggested in a large email containing many urls

donbright commented 7 years ago

size estimate 42G (using lftp du -h)

Ichimonji10 commented 7 years ago

Downloading with:

wget --mirror \
  --output-file=cpf_predicts.log \
  --no-verbose \
  --limit-rate=1m \
  ftp://cddis.gsfc.nasa.gov/slr/cpf_predicts/

Assuming all goes well, I'll make the data set available as a torrent this weekend.

donbright commented 7 years ago

attempting offline copy

sveeke commented 7 years ago

Also downloading. When it is ready, it will be available on http://climate-mirror-1.i11.nl.

@nickrsan your link needs another /.

Ichimonji10 commented 7 years ago

Downloaded 43 GB of data. Calculating hash with hashdeep -erl ./cddis.gsfc.nasa.gov/ > slr_cpf_predicts.csf.

bkirkbri commented 7 years ago

@sveeke Thank you. Please post hashdeep -erl ./cddis.gsfc.nasa.gov/ output.

sveeke commented 7 years ago

@bkirkbri Doing it right now, output will be available in http://climate-mirror-1.i11.nl/cddis.gsfc.nasa.gov.hash.

edit: it's done.

Are some people downloading on my mirror by the way? I've got 8 apache upload processes running that eat 1200 MB of memory each. Maybe not the most efficient way of downloading stuff of my server, since the upload connection isn't that fast ;).

bkirkbri commented 7 years ago

Are some people downloading on my mirror by the way?

Sorry about that! I was just doing HEAD requests but for that many URLs it's too much for Apache.

bkirkbri commented 7 years ago

@sveeke Looks like only 19GB of files? @Ichimonji10 Was yours 43GB?

Ichimonji10 commented 7 years ago

@bkirkbri @sveeke Yes, mine is 43 GB.

Ichimonji10 commented 7 years ago

Download log and hashdeep .csv file available here: http://www.ichimonji10.name/climatemirror/cddis.gsfc.nasa.gov.slr_cpf_predicts/

The actual data isn't available at this time. I'm syncing it between my local nodes for redundancy purposes, and hope to make torrents available this weekend.

sveeke commented 7 years ago

@bkirkbri Maybe it is still downloading? There were soms lftp processes still running. I'll check back home of this dataset is still downloading.

donbright commented 7 years ago

offline copy complete.

Total: 928 directories, 360663 files, 0 symlinks                     
New: 360663 files, 0 symlinks
44691410065 bytes transferred in 237215 seconds (184.0 KiB/s)
lftp cddis.gsfc.nasa.gov:/slr/cpf_predicts> exit

hashdeep: offline, contact me if data is needed

my hashdeep matches @Ichimonji10 except for that my copy does not have a bunch of files named ".listing", and also a bunch of files under ftp://cddis.gsfc.nasa.gov/slr/cpf_predicts/current which appears to be a directory that is updated constantly with new data.

gabefair commented 7 years ago

@donbright What command did you use to make sure you traversed symlinks?

donbright commented 7 years ago

dang i knew i forgot something. uhmmm i think my standard command was plain old fashioned "mirror" inside of lftp