Open ericnost opened 4 years ago
Here is one way to do it:
Add an "agency_field" property to the data class. Add the appropriate agency_field for each of the tables (using the above doc). Then, for the aggregation and charts:
program_data['State'] = np.where(program_data[program.agency_field]=="S", 1,0)
program_data['Federal']= np.where(program_data[program.agency_field]=="E", 1,0)
g = program_data.groupby([pd.to_datetime(program_data[program.date_field], format=program.date_format)])[["State", "Federal"]].sum() #for each date, take the sum of state inspections and federal inspections
g = g.resample("Y").sum() #sum of inspections for the year
g.index = g.index.strftime('%Y') #pretty format
g
ax = g.plot(kind='bar', stacked=True, title = program.name, figsize=(20, 10), fontsize=16)
ax
Example output:
These feels rather hacky, creating new columns to count State inspections and Federal inspections. One thing to note here - and to always pay attention to - is using sum vs count to aggregate. In this case, I think we want to sum? This shows us the number of inspections, NOT the number of facilities inspected. If we wanted that, I think we would use count.
@shansen5 One thing I just noticed in the Cross-Program notebook. An error I probably made!
For the charts, we currently do this:
d = program_data.groupby(pd.to_datetime(program_data[program.date_field], format=program.date_format))[[program.date_field]].count()
d = d.resample("Y").count()
d.index = d.index.strftime('%Y')
ax = d.plot(kind='bar', title = chart_title, figsize=(20, 10), legend=False, fontsize=16)
ax
But I think d = d.resample("Y").count()
should be d = d.resample("Y").sum()
We want to sum the number of inspections, enforcements, violations that occurred within the year.
Might have some relation to #22 though I think there we are seeing the problem before grouping and charting?
^ I believe this separate issue has been addressed
Challenges: for some things state/federal actions make sense to chart (inspections, enforcement actions, penalties), but for others (violations), not so much. This nuance just adds another layer of logic required for the notebook (if inspections/actions/penalties, chart state vs federal...else don't).
With respect to not all the info being in the same place for each program, it would be helpful for someone to list out for each of the following nine data tables:
Some if not all of this is here: https://docs.google.com/document/d/1EWVy52R1Eqy5dc5mjXK7m5EOdVyBVXbI9hkL-ps4BlI/edit But it would be helpful to compile it based on my bullet points above.
Another thing to keep in mind is that it's more than just state vs federal - there are often local actions and tribal actions listed. How do we handle those? Do we just not count them?
What do we do about SDWA? That's another 3 or 4 tables. Maybe we should disable that because we don't have capacity to support it??
Specifically, we would need to modify at least the show_chart()
function here: https://github.com/edgi-govdata-archiving/ECHO_modules/blob/b0a288e4cca2d3a46e5378fb51986c034d455c2c/DataSetResults.py#L22
If we wanted to be able to export CSVs that distinguish between state and federal actions, that would require something else....
See here for relevant fields: https://docs.google.com/document/d/1EWVy52R1Eqy5dc5mjXK7m5EOdVyBVXbI9hkL-ps4BlI/edit
Related to edgi-govdata-archiving/EEW_Planning/issues/98