climate-mirror / datasets

For tracking data mirroring progress
201 stars 18 forks source link

Dataset at ftp:/ftp.ncdc.noaa.gov/pub/data #162

Open nickrsan opened 7 years ago

nickrsan commented 7 years ago

ftp:/ftp.ncdc.noaa.gov/pub/data.

Suggested in a large email containing many urls

JeremiahCurtis commented 7 years ago

i have the oisst subfolder and am trying wget on some other subfolders

k80w commented 7 years ago

Downloading

claughinghouse commented 7 years ago

Downloading.

NickIAm commented 7 years ago

Here are some torrent magnet links of data I already have downloaded.

/pub/data/images (2.93G) - magnet:?xt=urn:btih:6d3cd4da56c507723b8ff453a32443017ac286a4&dn=images&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Fopen.demonii.com%3A1337&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969

Working on downlaoding /pub/data/satellite It's going to take a while, it has well over 3 million files in it.

JeremiahCurtis commented 7 years ago

this is probably one of the most important ftp directories in the entire effort....Any idea where we stand on this? I have been grabbing some smaller folders via https://www1.ncdc.noaa.gov/pub/data/ (which I assume is identical to the ftp directory except some additional subdirectories in the https directory; if that's incorrect, perhaps someone would know), especially the folders which are not mentioned in any issues here on github. I hesitate to start any large downloads because I have limited storage and download speed

StephWo commented 7 years ago

Trying to get a list of this one with du -ch --max-depth=1 Looks like a big one

!edit

154M ./109020 261M ./15min_precip-3260 1.2G ./aewc-v1 2.1M ./airsea 76M ./annualreports 533M ./anomalie 529K ./anomalies 153G ./asos-fivemin 1.3T ./asos-onemin 213M ./ASOS_Station_Photos 4.6M ./blizzard 9.9M ./ccd-data 6.2G ./cdmp 6.4M ./cdo 1.4G ./cirs 15G ./climgrid du: Access failed: 550 /pub/data/cmb/ersst/v4/tmp: No such file or directory 18G ./cmb 6.3G ./coastal 175G ./cpo 21M ./crdr 305M ./documentlibrary 253M ./ecosystems 97M ./EngineeringWeatherData_CDROM 590M ./extremeevents 752M ./gcos so far so good.

Now I'm waiting for hours for the size ./ghcn/ which is either massive or full of millions of files

I would propose splitting this one

edit

4.8T ./ghcn 232G ./globaldatabank 2.2G ./gpcp 1.8G ./gridded-nw-pac 28G ./gruan 86M ./gsn 7.2G ./gsod 64M hazards/
4.6M hidden/
74K homr/
269M hourly_precip-3240/ 98G igra/ 2.8G images/ 202M inventories/ 3.7M ish/ 37G ispd/ 358K john/ 397M jrennie/ 1016M lcd/ 0 madis/ 621M mcdw/ 677M metadata/ 108M mlost/ 3.2G ncep_gts/ 87M news media/ 5.4G nidis/ 124G noaa/ 56M noaaglobaltemp/ 16G normals/ 60G nsrdb-solar/ 985M nwshly/ 209G paleo/ 824M papers/ 1.9M pmorpts_py/ 4.2G radar/ 25M ratpac/ 33M req201509/

StephWo commented 7 years ago

While letting du continue, I will start pulling

./asos-fivemin ./asos-onemin ./ASOS_Station_Photos

to public mirror

JeremiahCurtis commented 7 years ago

I have been downloading the https://www1.ncdc.noaa.gov/pub/data/ folders, and I have completed the following (available locally at the moment; will make them public as soon as I can grab all the folders that no one has yet claimed here on github; I figure it's more important to grab data now and figure the rest out later):

109020 AEWC1 Airsea anomalie anomalies Blizzard coastal crdr ecosystems Extreme Events/special reports gcos gsn Hazards hpd ish mcdw mlost ncep gts nidis noaa global temp papers pmorpts py ratpac req201509 sds snow monitoring wct w pacific typhoon

currently running /ispd, /radar, and /gruan on internet download manager....these could take a few days with my connection speeds

As has been stated elsewhere, it's always a good idea to get at least two mirrors (for several reasons), so I hope someone else gets these folders also

JeremiahCurtis commented 7 years ago

Also, I believe, after searching through the issues, that /paleo /normals /igra

are complete

and

/ghcnd features a large.tar.gz file that has been downloaded, and contains most if not all the data from the ghcnd folder

If I'm wrong, please let me know

JeremiahCurtis commented 7 years ago

My wget on /noaa ended at 1961 because I ran out of disk space. If someone wants to pick up from 1962 onward, that would help immensely. I have no idea how to get wget to resume downloading a queue to a new drive without re-downloading files that are already saved on the old drive

StephWo commented 7 years ago

./asos-fivemin/ and ./ASOS_station_photos/

mirrored here: http://176.9.83.61/162

./asos-onemin/ in the making

gabefair commented 7 years ago

@JeremiahCurtis have you tried using the -N -m with wget? so: wget -N -m ftp://[...]/*

emf commented 7 years ago

I just grabbed /data/extremeevents. 617MB.

magnet:?xt=urn:btih:d65111efa8a9869d7f6b6e33d869e3ef73e27f03&dn=extremeevents&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Fopen.demonii.com%3A1337&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969

blitzdesigner commented 7 years ago

some of those folders are in #16 where I mirrored following folders:

all of them are mirrored here: ftp://ftp.blitzdesigner.info/pub/climatemirror/NCEI_Land-Based-Station-Datasets/

bkirkbri commented 7 years ago

Thank you to everyone that is splitting this up. Can someone volunteer to:

The above would be incredibly helpful to avoid wasted effort and make sure the work that has been done doesn't fall through the cracks.

StephWo commented 7 years ago

./asos-onemin/ is finished as well:

http://176.9.83.61/162

Hashdeep file has 42 MB

Foldersizes:

asos-fivemin 153 GB or 160150920 bytes asos-onemin 1,3 TB or 1290417332 bytes ASOS_Station_Photos 213 MB or 217784 bytes

gabefair commented 7 years ago

I have created a new ticket for the /ghcn data #331

StephWo commented 6 years ago

Be advised: because of changes in my hardware demands I wont be able to host this or the other datasets any longer after April 2018. Please create a copy if necessary before the end of April. The Full list of Dataset Issue-Numbers that are mirrored on my server and will not be hosted after April:

162 175 176 184 185 279 291 362

Find all these datasets at http://176.9.83.62 or http://climatemirror1.space