Open Serubin opened 7 years ago
Looks like http://newftp.epa.edu/ is down
@Plazmaz it's ftp://newftp.epa.gov/
Updated current download status. If anyone want's to start downloading other parts of this feel free - it's rate limited at 500kb/s so this is a pretty slow process.
tried wget but it stopped because of login issues
--11:42:18-- ftp://newftp.epa.gov/EPADataCommons/ (try:20) => `C:/Users/user/Music/newftp.epa.gov/EPADataCommons/.listing' Connecting to newftp.epa.gov|134.67.100.58|:21... connected. Logging in as anonymous ... The server refuses login. Giving up.
unlink: No such file or directory
FINISHED --11:42:18-- Downloaded: 0 bytes in 0 files
@JeremiahCurtis Give it another try. That happens every so often.
These still need to be downloaded. The RSEI directory looks daunting - might split that up a bit. 1.0T ./RSEI 7.5G ./RTPGIS 62M ./STANDARD_MINE 1.0K ./TESTAREA
ftp://newftp.epa.gov/RSEI/Version233_RY2012/Aggregated_Grid_Cell_Data/
working on the above csv files; since wget is having problems, i am doing direct downloads
this may take awhile
direct download not working either...not sure what's up
It appears the server is gone. ftp://ftp.epa.gov is still up
Final data count: 15G ./newftp.epa.gov 899M ./AIR_QUALITY_DATA 2.1G ./GKM_DOCUMENTS 2.2G ./EJSCREEN 2.3G ./CERCLA108B 4.0K ./CAM_HRA 36G ./COMPTOX 57G .
now the direct download is working again...
there are 3 massive csv files at ftp://newftp.epa.gov/RSEI/Version233_RY2012/Disaggregated_Microdata/
each is about 110 GB
@JeremiahCurtis Pull down whatever you can - I'm unable to access
working on it
direct download is kind of ineffective for a 110 GB file, though. If my browser crashes, I have to start all over....any ideas?
I'm also running downthemall on thousands of files from a lot of the directories at http://cdiac.ornl.gov/ftp/ This doesn't help direct download speeds, but if someone can confirm that the above ftp has been completely mirrored, I will end the dta session and that should speed up direct download.....thanks
Using wget might be good idea.
The download rates are limited to about 500kb/s
ftp://newftp.epa.gov/RSEI/Version233_RY2012/Disaggregated_Microdata/ is anyone else able to access?
I'm looking at this - it looks like the server is reaching its connection limits. Aria2 might be a good option for fast downloads.
On Thu, Jan 26, 2017 at 12:28 PM JeremiahCurtis notifications@github.com wrote:
ftp://newftp.epa.gov/RSEI/Version233_RY2012/Disaggregated_Microdata/ is anyone else able to access?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/climate-mirror/datasets/issues/279#issuecomment-275452756, or mute the thread https://github.com/notifications/unsubscribe-auth/ABr2HYKYAfGGIohvuAjThgOPDps0ImHzks5rWNe4gaJpZM4LuVFM .
I think I've hit my connection limits - I've got to bow out. I've got some amount of data that I can pass off to anyone - or I am happy to grab data from someone who downloaded to try and host the data somewhere.
While it's not DOWN for me, it's requiring a username and password to connect.
The server is responding with
421 Maximum login limit has been reached
Various clients give different messages when the server cannot be reached with the default anonymous credentials. Chrome asks for a username and password when in fact the anonymous credentials are still valid, the server is just overwhelmed.
OK, didn't know that. Thanks!
We have that subdirectory mirrored along with cdiac.ornl.gov. That subdirectory by itself has about 87 Gb. This is tracked as The Azimuth Backup Project Issue #3. It was one of the first we did.
To everyone, I would not, however, rely upon single copies. It would be good to know someone else has it, too, or could replicate ours elsewhere.
On Thu, Jan 26, 2017, at 12:13, JeremiahCurtis wrote:
I'm also running downthemall on thousands of files from a lot of the directories at http://cdiac.ornl.gov/ftp/ This doesn't help direct download speeds, but if someone can confirm that the above ftp has been completely mirrored, I will end the dta session and that should speed up direct download.....thanks — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub[1], or mute the thread[2].
Links:
what is the cdiac ftp mirror address? i followed the link on the main cdiac issue page here, and could not actually find any data.....maybe i'm missing something....thanks
Given that this data source is going to be taken down at anytime (and that the source is crazy slow), I think priority one should be downloading it - even if it's spread across multiple people. We can consolidate and duplicate later.
are we talking about cdiac or the epa ftp?
I'm talking about epa ftp - that's what this issue is for.
if someone else can get the 3 large files at ftp://newftp.epa.gov/RSEI/Version233_RY2012/Disaggregated_Microdata/, go for it......i have the first file under download but it would weeks at my current download rate....i am having mixed success at ftp://newftp.epa.gov/RSEI/Version233_RY2012/Aggregated_Grid_Cell_Data/
Started a sync of ftp://newftp.epa.gov/RSEI/Version233_RY2012/Disaggregated_Microdata/ at about 500KB/s
I'm curious if it would be worthwhile to try to make a FOIA request for this information as I'm having the same issue with slow downloads and we could get it on a hard drive or similar, albeit with a fee. The entire dataset could be sent on a 2 TB external HD.
@gofrogs2013 Good idea
Can someone volunteer to coordinate this issue? It's great that so many people are dividing it up to get it done! If one of you could track who has what that would be really helpful. Thanks!
I've suffered an untimely hard drive failure, I gotta back out. Sorry.
I went ahead and made a FOIA request for all data in the newftp folder. You can check the progress here: https://foiaonline.regulations.gov/foia/action/public/view/request?objectId=090004d281137e25
@bkirkbri Per the previous comment, I've made the FOIA request and added a link. I won't be able to coordinate it beyond that if we still want to try downloading the rest of it (which is probably the case) as I'll be working on NASA ERS files #289 for a while, but I'll post here if they approve the request.
@randomvariable How is the microdata folder moving? I am attempting a grab of the following RSEI subfolders: temp and shapefiles
fyi for anyone trying to look at @empirical-bayesian issue links, they actually refer to https://bitbucket.org/azimuth-backup/azimuth-inventory/issues/89 not the automatically generated github issues (like this #89)
I'm trying to get those Microdata files. I started with the last one in alphabetical order (Micro2012_2012...) and will go backwards from that. ETA for the first file is in 9 days...
@BauerPiepenbrink Is your download still going, and if so do you have the same ETA? Hopefully it will be possible to download these large files, but if not I will try getting them from the agency via FOIA as I mention above.
I just checked and ./AIR_QUALITY_DATA only has 58M of data in a single .zip file, which is far less than what @serubin reported above.
does anyone have a public mirror up for cross-checking data?
@gofrogs2013 steady as a rock. ETA 6d 23h with an average of 130 K/s. It's not fast but reliable so far.
A friend of mine and me used to try to calculate what has better bandwith from europe to china. A Gigabit Internet Uplink or a seacontainer full of Hard-Drives. Getting a physical Backup seems the way to go if possible. Anyway, I keep on nibbling. 27% already done :)
I shouldn't have jinxed it. Got Interrupted by the server half an hour ago. Continueing now. Make shure to use a download-client with the ability to resume after disconnect
@Serubin hope your hard drive failure doesn't mean your download is irretrievable :)
So, first file from the Disaggregated_Microdata folder is finally downloaded. Its Micro2012_2012.csv
http://176.9.83.61/InProgress_279/Disaggregated_Microdata/ This link will change later on.
Hashdeep Checksum for that single file:
110831639138,1d94bea31fe0bd03d732e01b7e7d6ab8,9087314828d9736e275d395f749b354676f7f4164a003319c3501257053b8366,Micro2012_2012.csv
Disregard the referenced issue above. @BauerPiepenbrink Congrats on the download! Were you able to actually open the file considering its size?
@gofrogs2013 well, I won't try to open it in whole :)
If I run
tail -n 40 Micro2012_2012.csv
I get the last 40 lines of that file which look like this:
14,1275,2277,5231336,318,1204704,6,3.29918E-08,5.93853E-06,2.18259E-07,0.00000E+00,2.18259E-07,1.55910E+02
14,1275,2278,5231336,318,1204704,6,3.01574E-08,5.42833E-06,4.37626E-09,0.00000E+00,4.37626E-09,3.69841E+00
14,1275,2279,5231336,318,1204704,6,2.75008E-08,4.95015E-06,4.11220E-08,0.00000E+00,4.11220E-08,3.59820E+01
So, as the file extension promised, comma seperated values. If someone really wants to dig into that there seems to be a software for that to basically filter the csv files called Microdata_Extractor.
I will try to download that too if I stumble upon it.
just wondering what we're still missing on the RSEI folder;
I have finished:
Version233_RY2012/Public_Release_Data/CSV version/ Version233_RY2012/Aggregated_Grid_Cell_Data/ Census_XWalks/ Shapefiles/
@JeremiahCurtis Still working on retrieval.
Picked up another 4TB drive so I should be able to get back to data pulling soon.
Current ftp contents: 899M ./AIR_QUALITY_DATA
0 ./CAM_HRA
2.3G ./CERCLA108B
406G ./COMPTOX
406G ./Computational_Toxicology_Data (Looks like a duplicate of the above)
2.2G ./EJSCREEN
33G ./EPADataCommons
44G ./GKM_DOCUMENTS
1.0T ./RSEI
7.5G ./RTPGIS
62M ./STANDARD_MINE
1.0K ./TESTAREA
1.9T .
Currently pulled down on my machine:
899M ./AIR_QUALITY_DATA 31M ./GKM_DOCUMENTS 2.2G ./EJSCREEN 14G ./EPADataCommons 2.3G ./CERCLA108B 4.0K ./CAM_HRA 32G ./COMPTOX 52G .
I intend to make my mirror public, but that may have to wait until the weekend.