Closed ericnost closed 3 years ago
For the agg_type, agg_col and unit we might have a dictionary for different columns so we can specify different columns as needed. I'll look at the other notebooks and see how that would work for them.
I think that's a fantastic idea though it may only apply to this DataSet object and it may require changes to the class (to handle a dictionary instead of self.agg_type as string) and to the notebooks?
make_data_sets.py creates objects for storing important info from database queries. We have objects for CWA inspections, violations, greenhouse gas emissions, etc. For the watershed notebook, we need an object to handle the Discharge Monitoring Reports (DMRs) for FY 2020. This will allow us to report trends in actual water pollutant discharges, as reported by regulated facilities (not just whether facilities were in violation or how much they exceeded their permit....we will be able to report ALL discharges). Note: this is maybe the biggest table in our database, so performance may be an issue. Also I think we may need to create a materialized view for it.
Here is an example of an existing DataSet object:
I suspect the object for the DMRs will look something like this:
To see what the DMR table looks like, you can view output from this notebook: https://colab.research.google.com/drive/1GmvDCpbDmBERymLlxaVrBkgLzG7t4M90#scrollTo=cfV3jjvmbt7n
As noted, the biggest challenge is how to implement standard aggregation of the info in the table. I suspect what we'll want to do is roll-up by
PARAMETER_CODE
and then sumDMR_VALUE_NMBR
to report on specific pollutants. But I haven't investigated the NPDES_DMRS_FY2020 table enough to know if that's right. Plus, it will depend on what the team decides as the goal of the notebook. Perhaps we want to show the total amount of pollutants permitted in the watershed (which, looking at the spottiness of these DMR records, may be more feasible than showing the total actually released).