A-Sepulveda / WaterQuality

0 stars 0 forks source link

Data processing update #1

Open gagecarto opened 5 years ago

gagecarto commented 5 years ago

:sunny::sunny::sunny::sunny::sunny::sunny::sunny::sunny::sunny::sunny:

@A-Sepulveda - Below is an update on the data processing in Python. Also, this will be a good intro to GitHub issues. This is a really awesome platform for setting goals, keeping track of progress, recording bugs etc

I wrote some simple Python code to process raw results data from the waterquality.us page. For the most part this accomplishes the same things you did with your R code which was in the RMD file. Currently I am running this locally but what's cool is that this could be run on a server if we want to scale this to a larger geography. Python file is below. https://github.com/A-Sepulveda/WaterQuality/blob/master/Scripts/processData.py

The above code does a few things 1) Downloads 10 years of data from waterquality.us via HUC for calcium, pH and temperature 2) For each of these variables I standardize the units, remove white space and omit PH observations outside of pre-defined ranges (only keeping ph observations > 0 and < 14) 3) I drop any observation that does not have a corresponding site as linked by the MonitoringLocationIdentifier attribute 4) The dataset is then mutated so that each row represents a unique day at a unique site. For each of these new rows I calculate the min, max, mean and sd for each variable. You'll also note the number of observations for each day for each variable listed in this new table.

You can review the cleaned, simplified version of the results data below. https://github.com/A-Sepulveda/WaterQuality/blob/master/sampleData/170401_since_01012009_simplified.csv

My thoughts are that this is how a new table would look in postgresql. We would map points data and it would be linked to this new table of cleaned, water quality info. Take a look and let me know what you think. For now, I'll start plugging these data in the Upper Snake example app below. https://github.com/A-Sepulveda/WaterQuality/tree/master/upperSnakeMapApp

:sunny::sunny::sunny::sunny::sunny::sunny::sunny::sunny::sunny::sunny:

gagecarto commented 5 years ago

Oh yeah - The parts where you ranked things based on quality etc can all be done in the app. My idea was to use the Python piece for the heavy lifting

A-Sepulveda commented 5 years ago

Going through this right now. How are you running Python... via R? Or are you using a different interface?

On Tue, May 7, 2019 at 2:01 PM Josh Gage notifications@github.com wrote:

Oh yeah - The parts where you ranked things based on quality etc can all be done in the app. My idea was to use the Python piece for the heavy lifting

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/A-Sepulveda/WaterQuality/issues/1#issuecomment-490233815, or mute the thread https://github.com/notifications/unsubscribe-auth/AIPW2PJW4HTUVO4FW2WK2YDPUHNW3ANCNFSM4HLMAW4Q .

-- asepulveda@usgs.gov Office: (406) 994-7975 Mobile: (406) 404-9155