Open nickrsan opened 7 years ago
i have the oisst subfolder and am trying wget on some other subfolders
Downloading
Downloading.
Here are some torrent magnet links of data I already have downloaded.
/pub/data/images (2.93G) - magnet:?xt=urn:btih:6d3cd4da56c507723b8ff453a32443017ac286a4&dn=images&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Fopen.demonii.com%3A1337&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969
Working on downlaoding /pub/data/satellite It's going to take a while, it has well over 3 million files in it.
this is probably one of the most important ftp directories in the entire effort....Any idea where we stand on this? I have been grabbing some smaller folders via https://www1.ncdc.noaa.gov/pub/data/ (which I assume is identical to the ftp directory except some additional subdirectories in the https directory; if that's incorrect, perhaps someone would know), especially the folders which are not mentioned in any issues here on github. I hesitate to start any large downloads because I have limited storage and download speed
Trying to get a list of this one with du -ch --max-depth=1 Looks like a big one
!edit
154M ./109020 261M ./15min_precip-3260 1.2G ./aewc-v1 2.1M ./airsea 76M ./annualreports 533M ./anomalie 529K ./anomalies 153G ./asos-fivemin 1.3T ./asos-onemin 213M ./ASOS_Station_Photos 4.6M ./blizzard 9.9M ./ccd-data 6.2G ./cdmp 6.4M ./cdo 1.4G ./cirs 15G ./climgrid du: Access failed: 550 /pub/data/cmb/ersst/v4/tmp: No such file or directory 18G ./cmb 6.3G ./coastal 175G ./cpo 21M ./crdr 305M ./documentlibrary 253M ./ecosystems 97M ./EngineeringWeatherData_CDROM 590M ./extremeevents 752M ./gcos so far so good.
Now I'm waiting for hours for the size ./ghcn/ which is either massive or full of millions of files
I would propose splitting this one
edit
4.8T ./ghcn
232G ./globaldatabank
2.2G ./gpcp
1.8G ./gridded-nw-pac
28G ./gruan
86M ./gsn
7.2G ./gsod
64M hazards/
4.6M hidden/
74K homr/
269M hourly_precip-3240/
98G igra/
2.8G images/
202M inventories/
3.7M ish/
37G ispd/
358K john/
397M jrennie/
1016M lcd/
0 madis/
621M mcdw/
677M metadata/
108M mlost/
3.2G ncep_gts/
87M news media/
5.4G nidis/
124G noaa/
56M noaaglobaltemp/
16G normals/
60G nsrdb-solar/
985M nwshly/
209G paleo/
824M papers/
1.9M pmorpts_py/
4.2G radar/
25M ratpac/
33M req201509/
While letting du continue, I will start pulling
./asos-fivemin ./asos-onemin ./ASOS_Station_Photos
to public mirror
I have been downloading the https://www1.ncdc.noaa.gov/pub/data/ folders, and I have completed the following (available locally at the moment; will make them public as soon as I can grab all the folders that no one has yet claimed here on github; I figure it's more important to grab data now and figure the rest out later):
109020 AEWC1 Airsea anomalie anomalies Blizzard coastal crdr ecosystems Extreme Events/special reports gcos gsn Hazards hpd ish mcdw mlost ncep gts nidis noaa global temp papers pmorpts py ratpac req201509 sds snow monitoring wct w pacific typhoon
currently running /ispd, /radar, and /gruan on internet download manager....these could take a few days with my connection speeds
As has been stated elsewhere, it's always a good idea to get at least two mirrors (for several reasons), so I hope someone else gets these folders also
Also, I believe, after searching through the issues, that /paleo /normals /igra
are complete
and
/ghcnd features a large.tar.gz file that has been downloaded, and contains most if not all the data from the ghcnd folder
If I'm wrong, please let me know
My wget on /noaa ended at 1961 because I ran out of disk space. If someone wants to pick up from 1962 onward, that would help immensely. I have no idea how to get wget to resume downloading a queue to a new drive without re-downloading files that are already saved on the old drive
./asos-fivemin/ and ./ASOS_station_photos/
mirrored here: http://176.9.83.61/162
./asos-onemin/ in the making
@JeremiahCurtis have you tried using the -N -m with wget? so: wget -N -m ftp://[...]/*
I just grabbed /data/extremeevents. 617MB.
magnet:?xt=urn:btih:d65111efa8a9869d7f6b6e33d869e3ef73e27f03&dn=extremeevents&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Fopen.demonii.com%3A1337&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969
some of those folders are in #16 where I mirrored following folders:
gsod
ncei
noaa
normals
nsrdb-solar
qclcd
uscrn (this one is currently uploaded from my offline-mirror)
ww-ii-data
all of them are mirrored here: ftp://ftp.blitzdesigner.info/pub/climatemirror/NCEI_Land-Based-Station-Datasets/
Thank you to everyone that is splitting this up. Can someone volunteer to:
The above would be incredibly helpful to avoid wasted effort and make sure the work that has been done doesn't fall through the cracks.
./asos-onemin/ is finished as well:
Hashdeep file has 42 MB
Foldersizes:
asos-fivemin 153 GB or 160150920 bytes asos-onemin 1,3 TB or 1290417332 bytes ASOS_Station_Photos 213 MB or 217784 bytes
I have created a new ticket for the /ghcn data #331
Be advised: because of changes in my hardware demands I wont be able to host this or the other datasets any longer after April 2018. Please create a copy if necessary before the end of April. The Full list of Dataset Issue-Numbers that are mirrored on my server and will not be hosted after April:
162 175 176 184 185 279 291 362
Find all these datasets at http://176.9.83.62 or http://climatemirror1.space
ftp:/ftp.ncdc.noaa.gov/pub/data.
Suggested in a large email containing many urls