Carceral-Ecologies / Carceral-ECHO-data

In this repo we are building tools to assess environmental compliance and enforcement in US prisons, jails and detention centers
GNU General Public License v3.0
7 stars 5 forks source link

ECHO Study Design #3

Open shapironick opened 4 years ago

shapironick commented 4 years ago

To get more granular information beyond the counts of violations visible in the exporter data we will have to scrape violation data so we can get a better understanding of which specific pollutants were found, through what method, and the issues were remediated (I'm sure there are many more interesting things we can learn with this data).

To do this:

and then

We'll want to see keep an eye on how this data can or can't be joined with the main ECHO data set we are working with.

lindsaypoirier commented 4 years ago

Here is a link to the workflow to access the data: https://ucla.box.com/s/ma0v4dh6bwy8wbe0t8hzfvi48l4e9u9z

shapironick commented 4 years ago

Thank you!

Would it also be possible to access via API? i'm looking at: https://echo.epa.gov/tools/web-services/enforcement-cases + /case_rest_services.get_case_info or /case_rest_services.get_download

benmillam commented 4 years ago

Update: I've downloaded the ECHO Facility Detail Report data via that API in Python for ~1,500 carceral facilities, but I haven't gotten around to digging through it much.

I'll be working on getting the API query code in R vs Python, and I can post the resulting JSON files -- maybe we can start doing some analyses on the ECHO facility details, even before settling the matching issues.

shapironick commented 4 years ago

excellent! This is great and probably where we will find some of the most valuable data. are you saying that you would analyze the difference between querying with python rather than with R? testing to see if that malformed JSON results from both? sorry if I'm misunderstanding!

Maybe we could do some parameter setting together on a video chat sometime to start thining about the most valuable way to analyze this data even if we, as you say, haven't settled the matching issues. I'll unfortunately be in Mississippi this Tuesday :/

benmillam commented 4 years ago

Hi all, here's a summary of what data are returned by the ECHO 'Facility Detail Report' API.

@shapironick to your question on R vs Python, I just want to move the API query code into R because we have more R users in the group. We'll have to handle the malformed JSON issue either way, fortunately it seems to only affect a small number of queries; a bridge to cross later, for the moment.

shapironick commented 4 years ago

This is great and very detailed! Thank you! You are now our ECHO guru!

I added one point. I'm not sure about the second half, but as the zipcode centroids are basically useless it felt important for this to be documented. "The ECHO exporter lat/long data was often zipcode centorids rather than the facility itself. The API data supplies proper facility lat/long data." Please let me know if i'm wrong about the second half.

Two small detail questions:

I'm also excited for the next steps!

In terms of thinking through study design and analysis, we might start by thinking of: Variables:

  1. variables that help us understand the environmental hazards to people inside the facility (violations of SDWA)
  2. variables that help us understand the environmental hazards to people inside and outside the facility (violations of CAA and CWA).
  3. Enviro variables external to the prison that might impact both people inside and outside the facility (EJScreenIndexes)
  4. variables that help us understand who, outside the facility, is impacted by variables in 2 (Demographics + tribal land proximity indicators).
  5. Variables that help us understand State and EPA enforcement

Analyses:

  1. analyses that help us find high priority prisons to investigate
  2. analyses that show us which regulations are being violated most often
  3. analyses that show us which regulations are being inspected + enforced most often
  4. analyses that show us how state and epa region effect data collection (can we see this in null responses?) inspection + enforcement.
  5. analyses that show us what percent of prisons have either violations of their own or are sited in a location with high violations. (this is key to the toxicity is inseparable from mass incarceration argument).
  6. analyses that show us the limitations of the database/api

Additional variables: I think we'll also want to keep track of what additional variables we would like to find that are not available in this data set.

melchimwaza commented 4 years ago

Hi Ben,

Just assigning this to you at Nick's request as you will be working on this later.

shapironick commented 4 years ago

:-) was just showing Mel how to assign issues on GH.

Zero rush and many thanks for your spreadsheet work! It went well today!

On Wed, Jan 8, 2020 at 12:34 PM melchimwaza notifications@github.com wrote:

Hi Ben,

Just assigning this to you at Nick's request as you will be working on this later.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Carceral-Ecologies/Carceral-ECHO-data/issues/3?email_source=notifications&email_token=ABZH334PN45VDVQNKJ2T2MTQ4Y2EXA5CNFSM4I7FDBWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIN4HBY#issuecomment-572244871, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZH3376Q4GQN3OEXW7HRC3Q4Y2EXANCNFSM4I7FDBWA .

-- Nicholas Shapiro Assistant Professor UCLA Institute for Society and Genetics Office: (310) 206-2366