climate-mirror / datasets

For tracking data mirroring progress
201 stars 18 forks source link

STORET #56

Open nickrsan opened 7 years ago

nickrsan commented 7 years ago

Name: STORET Organization: EPA Description URL: Download URL: https://www.epa.gov/waterdata/storage-and-retrieval-and-water-quality-exchange File Types: Size: Status:

johncronan commented 7 years ago

This is an interesting case. For Legacy STORET, there is ftp://ftp.epa.gov/storet/exports/, although the files are packaged as self-extracting .exe's, groan!

For Modern STORET, the only way to download data is by generating a report. A unique URL is returned, but it is not necessarily valid until report processing is complete (the user gets an email notice). There may also be complications involving retrieving the list of available data elements that corresponds to the queried stations.

Multiple stations can be queried for data in a single request, which is good, because there are over 800k stations. However, if the number of data points (= number of stations * avg. number of readings per station) is greater than about 10M, the request goes into a special overnight queue, which we probably want to avoid.

Just wanted to document. I'm going to punt on this one.

lrehmann commented 7 years ago

Request was submitted at 9:45 AM EST 25 Jan 2017 Submitted at: https://ofmpub.epa.gov/storpubl/dw_pages.querycriteria Used default generic search and "Result Download" to download all data.

Response Received at 12:20 PM EST 25 Jan 2017 Your request for STORET Station Description download is completed via Overnight batch processing. The Request_ID is 950317. You can download your file (size : 57302.3 KB) using the hyperlink https://www3.epa.gov/storet/modern/downloads/LSR_20170125_094524.zip MIRROR: https://rehmann.co/downloads/LSR_20170125_094524.zip

Files included in the zip are: StationAliases.txt 4MB | 64,138 Lines StationWeights.txt 44.2 MB | 702,574 Lines Stations.txt 358.5 MB | 798,964 Lines

johncronan commented 7 years ago

Does it contain the water quality data, or just station information?

lrehmann commented 7 years ago

Here are the headers on each file. Looks like I may have clicked "station download" instead of result download.

StationAliases.txt 4MB | 64,138 Lines Organization ID Station ID Station Name Station Alias Station Alias Type

StationWeights.txt 44.2 MB | 702,574 Lines Organization ID Station ID Station Name Project ID Location Weighting Factor Location Weighting Factor Unit Location Statistical Stratum Text Location Category Location Status Reference Location Type Reference Location Start Date Reference Location End Date Citation ID

Stations.txt 358.5 MB | 798,964 Lines Org ID Beach ID/Project ID Station ID Station Name Org Name Primary Type Secondary Type S/G/O Indicator Well Number Well Name Pipe Number NAICS Code Spring Type Improvement Permanence USGS Geologic Unit Code-Name Spring Other Name USGS Lithologic Unit Code-Name Location Point Type Point Sequence Number Point Name Latitude Longitude Horizontal Datum Converted Latitude Converted Longitude Converted Horizontal Datum Geopositioning Method Horizontal Accuracy Map Scale Elevation Elevation Unit Elevation Datum Elevation Method Country Name State County Hydrologic Unit Code Hydrologic Unit Name Generated Hydrologic Unit Code Generated Hydrologic Unit Name RF1 Segment Code RF1 Segment Name RF1 Mileage On Reach Ind NRCS Watershed ID Primary Estuary Secondary Estuary Other Estuary Name Great Lake Name Ocean Name Tribal Land Indicator Natv American Land Name FRS Key Identifier Description Text Well Type Aquifer Name Formation Type Well Hole Depth Measure Well Hole Depth Measure Unit Station Document/Graphic Name Station Document/Graphic URL HUC Twelve Digit Code Generated HUC Twelve Digit Code Last Change Date User ID Last Change Last Trans

The issue with gathering all of the result data:

Number of Results Returned: 220,835,112 The number of Results that match your search criteria has exceeded the allowable Result Report size limit of 13,000,000. Please select 'Back' to modify your search.

I'll see if I can submit requests for each state.

stephanne-t commented 7 years ago

I've been downloading them state-by-state from waterqualitydata.us (better interface) -- have got station data, physical / chemical data, and biological data. Should be done this afternoon, so don't worry about downloading them by brute force @lrehmann

lrehmann commented 7 years ago

I've made data requests for each state from https://ofmpub.epa.gov/storpubl/dw_pages.querycriteria I'll be downloading the copies once they are done processing. Shouldn't hurt to have 2 copies @stephanne-t 👍🏻

stephanne-t commented 7 years ago

I finished the bulk download -- will hold on to it until there's news re: semi-permanent storage space.