climate-mirror / datasets

For tracking data mirroring progress
201 stars 18 forks source link

Dataset at ftp:/ghrc.nsstc.nasa.gov/pub/doc #169

Open nickrsan opened 7 years ago

nickrsan commented 7 years ago

ftp:/ghrc.nsstc.nasa.gov/pub/doc.

Suggested in a large email containing many urls

clickbg commented 7 years ago

Downloading

clickbg commented 7 years ago

Downloaded and available at: http://mirrors.sgate.org/climate-mirror/ghrc.nsstc.nasa.gov/pub/doc/

donbright commented 7 years ago

Mirror: offline, contact me if data is needed Size: 247M Hashdeep: offline, contact me if data is needed


redownloaded 2017-2-19 due to hard drive failure

bkirkbri commented 7 years ago

@clickbg I get a 404 for that URL. @donbright Thanks!

wantonwonton commented 7 years ago

I'm downloading this to be an offline copy.

wantonwonton commented 7 years ago

My download completed and my hashes match those from @donbright.

I also ran this command: find ghrc.nsstc.nasa.gov/pub/doc/ -type f ! -name .listing | xargs md5sum | sort -k 2 | md5sum

which computed this overall hash: bd8e71c99052e38982b13133c80e2dde.

donbright commented 7 years ago

this is weird but i cannot get my md5 hash to match @wantonwonton

don@miriam:/var/www/html/169$ find ghrc.nsstc.nasa.gov/pub/doc/ -type f ! -name .listing | xargs md5sum | sort -k 2 | md5sum
1c9411f4567c72a86ef1b49e9bc8a612  -

i wonder if i am doing something wrong here.

wantonwonton commented 7 years ago

@donbright I notice that your earlier hashdeep output was generated from /var/www/html/disk2/169/ghrc.nsstc.nasa.gov (note the "disk2"). Was that a different copy that would account for the different hash?

You might try running hashdeep again and see if you get the same results that you had posted previously.

donbright commented 7 years ago

@wantonwonton thanks... but i still can't get it, i tried re-downloading the entire thing again

don@miriam:~$ lftp -c mirror ftp://ghrc.nsstc.nasa.gov/pub/doc
don@miriam:~$ mkdir -p ghrc.nsstc.nasa.gov/pub
don@miriam:~$ mv doc ghrc.nsstc.nasa.gov/pub/
don@miriam:~$ find ghrc.nsstc.nasa.gov/pub/doc/ -type f ! -name .listing | xargs md5sum | sort -k 2 | md5sum
1c9411f4567c72a86ef1b49e9bc8a612  -
don@miriam:~$ 

very puzzling

wantonwonton commented 7 years ago

When I compared our hashdeep files (after sorting them), there were no differences in the sets of hashes:

diff ghrc-doc-hashdeep-donbright-sorted ghrc-doc-hashdeep-no-listings-sorted
1c1
< ##
---
> ##
654,655c654,655
< ## $ hashdeep -erl .
< ## Invoked from: /var/www/html/disk2/169/ghrc.nsstc.nasa.gov
---
> ## $ hashdeep -rl ./pub/doc/
> ## Invoked from: /mnt/data1/climate-data/ghrc.nsstc.nasa.gov

Did you try running hashdeep again and compare with your previous results? Maybe some additional file got created in some directory?

I re-ran hashdeep and the md5sum command again, and got the same results as before. That md5sum command is a rewrite I did of the original, which would look like this:

find ghrc.nsstc.nasa.gov/pub/doc/ -type f -exec md5sum {} \; | grep -v '.listing' | sort -k 2 | md5sum

You could give that a try and see the result is any different, though I wouldn't expect it to be unless there's some behavioral difference between our environments. (The two commands produced the same hash in my environment.)

donbright commented 7 years ago

@wantonwonton thanks, this might seem tedious but i really want to make sure i understand what im doing here , i appreciate it.

i ran another hashdeep and there are two symlinks that might explain the difference? my lftp mirror did not apparently grab their target files, it just leaves them dangling as broken symlinks

don@miriam:/var/www/html/169$ find ghrc.nsstc.nasa.gov/pub/doc/ -type f -exec md5sum {} \; | grep -v '.listing' | sort -k 2 | md5sum
1c9411f4567c72a86ef1b49e9bc8a612  -
don@miriam:/var/www/html/169$ cd ghrc.nsstc.nasa.gov/
don@miriam:/var/www/html/169/ghrc.nsstc.nasa.gov$ hashdeep -r -l -a -k ../hashdeep.audit.txt .
./pub/doc/gpmgv/lpvex/gpm2dlpvex/Dataset_summary_forDAAC2.pdf: No such file or directory
./pub/doc/gpmgv/mc3e/gpmpawneemc3e/gpm_chill_pawnee_mc3e.html: No such file or directory
hashdeep: Audit passed
don@miriam:/var/www/html/169/ghrc.nsstc.nasa.gov$ ls -l pub/doc/gpmgv/lpvex/gpm2dlpvex/
lrwxrwxrwx 1 don don    31 Jan 30 02:45 Dataset_summary_forDAAC2.pdf -> ../Dataset_summary_forDAAC2.pdf
don@miriam:/var/www/html/169/ghrc.nsstc.nasa.gov$ ls -l pub/doc/gpmgv/mc3e/gpmpawneemc3e/
lrwxrwxrwx 1 don don 70 Jan 30 02:46 gpm_chill_pawnee_mc3e.html -> /ftp/public/pub/doc/gpmgv/mc3e/gpmchillmc3e/gpm_chill_pawnee_mc3e.html
x775 commented 7 years ago

I have a complete copy as of this posting.

md5: 31993d65952c39da2c8123b4304b3e06 sha256: 4966295cdc43b457a1e90679be26e3c9baed99b3d1c9be80bffa4e728b96c2ba

Individual checksums: https://gist.github.com/x775/814acb382221152b621b75634aed070b

Size: 253.72266MB

Compressed name: ghrc_nsstc_nasa_gov.7z Compressed md5: 14b45501c8459e7335fdf01d8cd1ef00 Compressed sha256: 078d62b34250f7b80d6226628f82dbbbf8dce1682496b1fa60db59ac81e231f6 Compressed size: 202.80044MB Compressed download link: https://drive.google.com/open?id=0B6PlQrUTwL1PdDJyR25tWDhocFk

entr0p1 commented 7 years ago

Grabbing

entr0p1 commented 7 years ago

Done

Checksums: https://gateway.ipfs.io/ipfs/QmPnWJwZmdRAUeWpZQyaZmTZkYd32YE62ZpFZfxFQSQy3a Root Directory: https://gateway.ipfs.io/ipfs/QmPMyPnpmNBYBFHStpPXjjgouEntWm6PecRjbUkwUCZefT Size: 250MB