climate-mirror / datasets

For tracking data mirroring progress
201 stars 18 forks source link

NOAA CMDL Ozone Data #275

Open emf opened 7 years ago

emf commented 7 years ago

I'm currently mirroring the climate data from

emf commented 7 years ago

i'll launch a torrent for it once it's finished. dunno what else to do with it.. there's more stuff under as well, that this particular ftp site does not cover.

siennathesane commented 7 years ago

Which agency is this? Also, who wants to own this dataset/github issue?

rustyguts commented 7 years ago

@emf Server is going up and down for me. timeouts and very slow transfers 50kb/s. Sitting on a gigabit connection. Going well for you?

emf commented 7 years ago

@mxplusb this is the anon ftp server for the CMDL. "The Earth System Research Laboratory - Global Monitoring Division (ESRL), formerly the Climate Monitoring and Diagnostics Laboratory (CMDL), is part of NOAA's Office of Oceanic and Atmospheric Research (OAR)." - see

@RustyGuts - it's going just fine for me. you can probably pick a different site to mirror. i've already got 14GB of data out of this thing (mostly aerosols data so far) since last night.

rustyguts commented 7 years ago

@emf Thanks. Saw your tweet. Will check back on progress later and will still mirror if connection is good.

emf commented 7 years ago

@mxplusb this site also contains CO2 data, which is what's linked from

emf commented 7 years ago

made it to /data/ccgg climate change greenhouse gases. timeseries data. This portion is a dup of CarbonTracker data from issue #10

emf commented 7 years ago

damn. it crapped out. got as far down the tree as /data/ccgg/CT-CH4/molefractions and it died. I think they shut it or me down; it's refusing anon ftp. got 19GB out of it; all the /data/aer aerosols data and complete of /data/barrow (greenhouse gas sensor in Barrow, AK). Will back off and try again later.

emf commented 7 years ago

This one has gone entirely sideways. They've banned my mirror server.

I've posted partial torrents for: /data/aer (18GB) /data/barrow (112M) /data/campaign (7M)

Assistance from other IP addresses would be appreciated for the following: /data/ccgg /data/greenhouse_gases /data/meteorology /data/ozone /data/ozwv /data/radiation /data/trace_gases

as-com commented 7 years ago

Well, the FTP server is less overloaded, but still overloaded enough to make mirroring virtually impossible (it's throwing all sorts of errors)

I'll download the torrents, though.

NickIAm commented 7 years ago

I believe i have the /data/radiation and /data/trace_gases downloaded. I'll have to double check they are the complete data sets. I'll go ahead and try to post a torrent for these two later today.

I think I also have some of the /data/ozone as well.

NickIAm commented 7 years ago

Here is the magnet links to what I have so far /data/ozone (70M) - magnet:?xt=urn:btih:933bc3dfa0bf4560f2508eb131ab80a9f7efc653&dn=ozone&

/data/radiation (8.75G)(might not be complete, I can't get on the server right now to check) - magnet:?xt=urn:btih:408dd0761724eb7ebb69de4b4204d31cbd87659e&dn=radiation&

emf commented 7 years ago

@NickIAm I was able to get /data/ozone from your magnet link, but /data/radiation isn't working.

emf commented 7 years ago

radiation is downloading now

emf commented 7 years ago

i've also got trace_gases now. 2.6GB magnet:?xt=urn:btih:28a6b5409e6493d1be8efed09e581d46f9c2c500&dn=trace%5Fgases&

siennathesane commented 7 years ago

It might be worth reaching out to the web master and asking for a whitelist based off the project's goal.

Mike Lloyd 719.766.1923

On 27 January 2017 at 18:14, Erik Fichtner wrote:

i've also got trace_gases now. 2.6GB magnet:?xt=urn:btih:28a6b5409e6493d1be8efed09e581d 46f9c2c500&dn=trace%5Fgases& %3A80&

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread .

donbright commented 7 years ago

I would like to propose breaking this into 10 separate issues, using lftp du size estimates, as follows. This would allow more people to participate and possibly reduce stress on all machines involved.
If this is acceptable i can create the issues myself.

302 NOAA ESRL GMD Aerosols

FTP link: Size Estimate: 142G Data description:

303 NOAA ESRL GMD Trace Gases

FTP link: Size Estimate: 2.7G Data description:

304 NOAA ESRL GMD Halocarbons and other Atmospheric Trace Species

FTP link: Size Estimate: 1.3G Data description:

305 NOAA ESRL GMD Global Greenhouse Gas Reference Network (carbon cycle greenhouse gases)

FTP link: Size Estimate: 106G Data description:

306 NOAA ESRL GMD Ozone and Water Vapor Group

FTP link: Size estimate: 19G Data Description:

307 NOAA ESRL GMD Station Meteorology

FTP link: Size estimate: 4.4G Data Description:

308 NOAA ESRL GMD CarbonTracker Ch4 (Methane)

FTP Link: Size Estimate: 68G
Data description:

309 NOAA ESRL GMD CarbonTracker CO2 (Carbon Dioxide)

FTP Link: Size Estimate: 3.4 T Data description:

310 NOAA ESRL GMD CarbonTracker Lagrange

FTP Link: Size estimate 1.3 T
Data description:

311 NOAA ESRL GMD Radiation

FTP Link: Size Estimate: 196G

What do you think? Thanks.

update - created issues after discussion below, added issue links

sveeke commented 7 years ago

@donbright Seems like a good idea. If the issues are created I will mirror all but the ones with a T in the size ;).

donbright commented 7 years ago

thanks @sveeke ... to be clear, of course anyone could create the issues if they wanted, but i can do it tonight when i get some time.

nickrsan commented 7 years ago

I think it's a good idea. Let me know when you do it and I'll close this one. Thank you!

Nick Santos UC Davis Center for Watershed Sciences, 530-754-9362

On Thu, Feb 2, 2017 at 5:31 AM, don bright wrote:

thanks @sveeke ... to be clear, of course anyone could create the issues if they wanted, but i can do it tonight when i get some time.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread .

sveeke commented 7 years ago

@donbright I started mirroring and making issues, but this server bans people quite fast. I have no interest in mirroring data if the maintaining party doesn't want it to be mirrored ;). So I let this one go.

emf commented 7 years ago

I've finally got /data/ccgg! 106GB in total. This one gets new observation data added daily in a couple of places, so it will need a resync if there's an imminent threat. I don't know if that's being tracked anywhere.

Here's the magnet link, but I'm not sure exactly when this torrent will be available, since transmission-daemon is giving my little vm server memory problems and linux kernel keeps killing it. I've got some sysadmin junk to do here.


emf commented 7 years ago

/data/hats is done. 1.35GB.


emf commented 7 years ago

/data/meteorology is done. 4.7GB


ivanstegic commented 7 years ago

I'm also grabbing this.

NickIAm commented 7 years ago

Is posting BitTorrent downloads of the different directories when someone has the completed copy useful? Is there another way we should be making the data available? I can't contribute too much because i'm stuck with comcast and their 1TB data caps, but I'll help where I can.

emf commented 7 years ago

@NickIAm .. that's a fine question, and one that I think can only be answered after we collectively figure out where this data lives for the upcoming future. I'm getting near full on my cloud server, and I can't take the CarbonTracker data (#308, #309, #310), but I'm very close to completing everything else from this (minus ISO images of sensor code, see #302). I've been posting them as torrents because it's easy, but I can also make this available via an ftp site, and I can commit to keeping what I've got available via both methods for the next few years, minimally. Beyond that, I don't have any good ideas or answers regarding data custodianship.

JeremiahCurtis commented 7 years ago

Working on greenhouse gases

It appears that the zip files found at and at contain the information found in the respective /brw, /mlo, /smo, and /spo folders...... If anyone else can confirm this, it would reduce the download time considerably. I have several other wgets and IDM projects running and I have a slow connection overall, so this would help

emf commented 7 years ago

@JeremiahCurtis when i looked, "/data/greenhouse_gases" was symlinks into trace_gases, which has already been mirrored.

ftp> cd /data/greenhouse_gases
250 Directory successfully changed.
ftp> ls
200 PORT command successful. Consider using PASV.
150 Here comes the directory listing.
lrwxrwxrwx    1 0        0              18 Dec 07  2015 ch4 -> ../trace_gases/ch4
lrwxrwxrwx    1 0        0              18 Dec 07  2015 co2 -> ../trace_gases/co2
226 Directory send OK.

I have finished /data/ozwv. 20GB.


This concludes everything under /data on this ftp server.

emf commented 7 years ago

Ugh. /data/radiation from @NickIAm is incomplete. i only have 8.8GB of the estimated 196GB in #311, and I don't have enough space to mirror that. So, /data from this server is NOT complete. Sorry.

JeremiahCurtis commented 7 years ago

what folders do we still need under /radiation, and is that the only incomplete folder left here?

emf commented 7 years ago

@JeremiahCurtis Almost everything is needed from /data/radiation. At this point, i'd say just grab the 8GB from the torrent to get started and then let wget work from that point; save the poor noaa server a few gb outbound.

Also, I'd like to stop publishing the bad torrent for /data/radiation, so if you're going to do that, please let me know when you've got it all so I can stop announcing it.