climate-mirror / datasets

For tracking data mirroring progress
201 stars 18 forks source link

NASA NASCOM Archive Data #178

Open nickrsan opened 7 years ago

nickrsan commented 7 years ago

ftp:/ladsftp.nascom.nasa.gov/allData/.

Suggested in a large email containing many urls

Colt8027 commented 7 years ago

Download Status of ftp://ladsftp.nascom.nasa.gov/allData/ - In Progress

Size - 16GB of Total Unknown Update #1 - Time 1848 EST - 24 GB Update #2 - Time 1948 EST - 49.4 GB Update #3 - Time 2042 EST - 72.8 GB Update #4 - Time 2155 EST - 104 GB Update #5 - Time 2303 EST - 127 GB Update #6 - Time 0054 EST - 162 GB

My last update. I'm hoping I'll be done soon and I can mirror this some where. I believe this will be bigger than 250 GB

bofh453 commented 7 years ago

Had this one earlier, but apparently my copy was incomplete (60GB). Restarted, though it's low priority.

Colt8027 commented 7 years ago

@bofh453 Low priority was removed earlier before I started on it. I'm over 104GB right now and not even close to being finished by tracking these folders. I'm on 1Gbe box also

bofh453 commented 7 years ago

Noted, stopped at 67GB but not deleting what has already been grabbed.

Colt8027 commented 7 years ago

@bofh453 I'm at 142 GB now, if you look at the folder names, I'm only in

/1003/MOB09A1GS_EVI/2006

This is gonna take a long time. Takes me 0.3s per file roughly

Colt8027 commented 7 years ago

So this is interesting, I've scanned each folder in /1003, roughly 348-350 GB per folder... So I'm gonna need to start splitting/compressing folders and uploading very soon here, I don't have enough space on my dedicated box.

What do you think the best bet is right now, I'm gonna have to start off-loading to the cloud or something

Since a majority of these are pictures, they won't compress well, and I really only have access to my trial ACD for the time being. So I can

  1. Put them into a .7z with a password random generated that is encrypted to the file name.
  2. Find another way to upload all the individual files (26,698 Images) in the first folder I'm working on

UPDATE: Since these are all images, very UNLIKELY these will get flagged. I'm throwing these up on ACD. Anyone who wants the link and wants to start downloading I can PM. I simply do not have enough space to do it all at once, so I gotta get the data, offload and get more data. Luckily all these are .jpg. These directories are insane

Stats: 1618 EST: 581GB: 103,042 Images

Colt8027 commented 7 years ago

@nickrsan @mxplusb Alright so thought I'd give you a update. There is so much data on here. I only have 1 solution. Currently my 1Gbe dedi is RAID1, I'm finally finishing up only ONE sub directory 1003/.

Beyond that, It's gonna be a long time. The URL is no long avail and is direct IP access only. The fastest way I can get this data off is direct grab and direct upload, nor putting it in .7z etc.

Here is my status 2336 EST: http://i.imgur.com/DTLJP3m.png.

If I keep at it, It will be probably a few more days as long as the server stays online

wantonwonton commented 7 years ago

@Wolf-Rayet What's the server's IP address? Maybe I can download some files starting from the other end of the list of directories. Have you needed to insert delays between files?

donbright commented 7 years ago

the name ladsftp.nascom.nasa.gov no longer resolves in dns

don@miriam:/var/www/html$ host ladsftp.nascom.nasa.gov
Host ladsftp.nascom.nasa.gov not found: 3(NXDOMAIN)

However ftp://ladsweb.nascom.nasa.gov seems to be OK

i noticed the README says this:

The LAADS system takes advantage of low cost of disk storage to retain several
petabytes of data on-line.
Colt8027 commented 7 years ago

@wantonwonton The host now is ftp://198.118.194.40/allData

If that's correct for that much data... there is no fucken way we'll be able to mirror this. I thought this was gonna be a simple mirror... I'm already getting tired of moving my estimated I have currently which isn't on the cloud is 6TB

http://i.imgur.com/QZKuV6r.png

donbright commented 7 years ago

Very Rough Size Estimate: 163 TB using lftp du -h

I also got an enormous number of 550 errors like this:

du: Access failed: 550 Failed to change directory. (/allData/3110/NPS_VSTIP_L2)
du: Access failed: 550 Failed to change directory. (/allData/3110/NPS_VSUM_L2)
du: Access failed: 550 Failed to change directory. (/allData/3110/NPS_VSUT_L2)
du: Access failed: 550 Failed to change directory. (/allData/3110/NPS_WCTTIP_L2)
160T    ./3110
du: Access failed: 550 Failed to change directory. (/allData/3144)
du: Access failed: 550 Failed to change directory. (/allData/4)
du: Access failed: 550 Failed to change directory. (/allData/404)
du: Access failed: 550 Failed to change directory. (/allData/41)
du: Access failed: 550 Failed to change directory. (/allData/5)
du: Access failed: 550 Failed to change directory. (/allData/5000)
du: Access failed: 550 Failed to change directory. (/allData/51)
du: Access failed: 550 Failed to change directory. (/allData/55)
du: Access failed: 550 Failed to change directory. (/allData/6)
JeremiahCurtis commented 7 years ago

could a FOIA request work for something like this? As far as I know, size is generally not a barrier to FOIA requests Any other ideas?