Figure out how to keep things up-to-date for the long term/Switch Source Datasets from CSVs to APIs

CodeForPittsburgh / food-access-map-data

Data for the food access map

MIT License

8 stars 18 forks source link

Figure out how to keep things up-to-date for the long term/Switch Source Datasets from CSVs to APIs #92

Closed maxachis closed 3 years ago

maxachis commented 3 years ago

Come up with a workflow for how to keep things updated so that if two years from now the original people aren't around, we can resolve that stuff.

hellonewman commented 3 years ago

The best way to do this is to switch as many source datasets as possible from CSVs to APIs.

The good news is our major datasets can pull from a regularly updating API!

Food Bank sites (@oscarsyu working on this)
Farmers markets (@marykohl3 and @cgmoreno on this)
SNAP sites (unclaimed)

The Just Harvest sites (Food Bucks, Fresh Access, Fresh Corners) is currently a Google Sheet that we convert to a .xlsx. Possibly we could connect directly to the google sheet for this info?

Lastly, I just realized the WIC sites appear to update weekly, but actually looks like it is stale data from 2015. Trying to find a good alternative. The state website "Find WIC Stores" website would have to be scraped to get a table of sites. https://www.pawic.com/FindWICStores.aspx

hellonewman commented 3 years ago

Another thought- since our data workflow exports a timestamped CSV now, does that create a problem for our Heroku workflow? like is Heroku going to be looking for the plain old merged_data.csv file?

hellonewman commented 3 years ago

Conor + Ellie will work on the SNAP site API.

wkbraid commented 3 years ago

The FindWICStores page sends an ajax request which returns json data. See view-source:https://www.pawic.com/FindWICStores.aspx "function getStores() {" on line 491.

conorotompkins commented 3 years ago

@cgmoreno has my code for hitting the SNAP API

hellonewman commented 3 years ago

I've got Matt's API call code...will try to whip it into shape for next meeting! I'll be doing this in R using the jsonlite package.

wkbraid commented 3 years ago

I looked into the WIC request a little, the data you send should probably be: {county: 2, zip: 15212, miles: 20} Here 2 means Allegheny (probably because it is the second in the list), 15212 is pretty central. 20 miles is the max the drop-down goes to, which might just barely cover the whole county if you started from a point dead center. If the API lets you use 30 miles, that might be better.

In chrome at least, you can go to the dev console, go to Network and filter by "method:POST" to see the outgoing post request when you hit the button. As far as I know, you probably want to copy some/most of the request header values, but I don't have a lot of experience with this.

mishugana commented 3 years ago

Here is some (working) python code

`import json import shlex from subprocess import run

wgetCommand = """wget -q -O -\ --method POST \ --header 'Accept: application/json, text/javascript, /; q=0.01' \ --header 'Content-Type: application/json; charset=UTF-8' \ --body-data '{"county":"2","zip":"","miles":"5"}'\ 'https://www.pawic.com/FindWICStores.aspx/FindStores'"""

args = shlex.split(wgetCommand)

results = json.loads(json.loads(run(args, capture_output=True).stdout)['d'])['Result']`

its actually a shell script, and I am ashamed that it isnt working in requests, but this should work well in the meantime

mishugana commented 3 years ago

wget -q -O - --method POST \
--header 'Accept: application/json, text/javascript, /; q=0.01' \
--header 'Content-Type: application/json; charset=UTF-8' \
--body-data '{"county":"2","zip":"","miles":"5"}' \
'https://www.pawic.com/FindWICStores.aspx/FindStores' | jq -r '.d' | jq > wicresults.json

run that in terminal (no need for python) or let me know if you want me to wrap it up in an actual shell script to run as a command. this will output the results to a json. if we turn it into a shell script for real for real we can supply some params like county, zip, miles, and output filename

hellonewman commented 3 years ago

@mishugana ok I was able to run this but am not getting any results? It creates a wicresults.json file with nothing in it.

I feel like I am holding up this whole thing with my inexperience! Would you be able to save a json or csv to our repo for me to at least get the data processing work done? Then we can figure out how to better integrate actually running the wget request and pulling the most recent data.

maxachis commented 3 years ago

If @mishugana is not able to get into that before the meeting on Wednesday, I propose we look into that as part of the agenda of our meeting. I may also want to learn more about this!

hellonewman commented 3 years ago

Added a cleaned WIC dataset: https://github.com/CodeForPittsburgh/food-access-map-data/blob/master/food-data/Cleaned_data_files/cleaned_PA_WIC.csv

hellonewman commented 3 years ago

WE ARE DONE!