code4sac / wicit

A simple node/express app for finding locations that accept WIC in California, using data from the new California Department of Public Health open data portal.
http://findwic.com/
MIT License
19 stars 20 forks source link

Search for Approved Foods #13

Open jesserosato opened 10 years ago

jesserosato commented 10 years ago

WIC is super specific about what it covers, their site has PDFs full of approved foods. It would be great to get that data in a usable format and provide a search utility for WIC users to be able to see if foods qualify before they go to the store. Ideally, they'd be able to scan UPCs in the store to see if that food qualifies, but that may be overly ambitious...

marcfarley commented 10 years ago

Would something like this help? http://www.pdfonline.com/convert-pdf-to-html/

jesserosato commented 10 years ago

@marcfarley Thanks for the tip! It looks like it could be really helpful. I'll point it out for more research at our next Hack Night (Wednesday at 6pm at Sacramento Hacker Lab if you're interested and in the Sacramento area).

Elizabethcase commented 10 years ago

I used tabula.nerdpower.org to extract the tables. Still need someone to go through by hand and add in foods and details that are listed in the main food file

Elizabethcase commented 10 years ago

Hey, Joseph had a great idea: see if there's an API that connects with UPCs to grab more info about food. Also, we still need a search, woo hoo!

jesserosato commented 10 years ago

@Elizabethcase That's a great idea. It looks like there's a few UPC APIs out there, but it looks like Amazon's is the best free one. I'll add food search to the issues queue once I get the back end pushed up to the repo (hopefully by hack night next week).

Elizabethcase commented 10 years ago

Great -> and just an issue to note here, I need to fix the UPCs because excel auto removed leading zeros

jesserosato commented 10 years ago

@Elizabethcase I think we may have to do the food data in batches. Looking at the CSV, it looks like the different PDFs maybe had different column orders? Or some just are missing columns? I'll bring it up at Hack Night and see if anyone has any good ideas on how to clean this data up.

Elizabethcase commented 10 years ago

Yep. Some have organic, some have packaging, some don't have either. I can set all the blanks to nulls but it's definitely not super clean data

On Sep 22, 2014, at 16:57, Jesse Rosato notifications@github.com wrote:

@Elizabethcase I think we may have to do the food data in batches. Looking at the CSV, it looks like the different PDFs maybe had different column orders? Or some just are missing columns? I'll bring it up at Hack Night and see if anyone has any good ideas on how to clean this data up.

— Reply to this email directly or view it on GitHub.

jesserosato commented 10 years ago

Yeah, I mean, it is scraped from PDFs, so it's actually pretty great. I think I'm gonna just use the common fields for now, and we'll have to ask CDPH if they can get us clean data at some point.

On Sep 22, 2014, at 7:52 PM, Elizabeth Case notifications@github.com wrote:

Yep. Some have organic, some have packaging, some don't have either. I can set all the blanks to nulls but it's definitely not super clean data

On Sep 22, 2014, at 16:57, Jesse Rosato notifications@github.com wrote:

@Elizabethcase I think we may have to do the food data in batches. Looking at the CSV, it looks like the different PDFs maybe had different column orders? Or some just are missing columns? I'll bring it up at Hack Night and see if anyone has any good ideas on how to clean this data up.

— Reply to this email directly or view it on GitHub. — Reply to this email directly or view it on GitHub.

civicissuebot commented 9 years ago

Hello! This issue looks like it still needs help! It's been clicked on 1 times through the Civic Issue Finder on http://www.codeforamerica.org/. Can this issue be closed or does it still need some assistance?

If you wrote this issue, you can always update the labels for specifying tasks, add more info in the description to make it easier to contribute, or re-write the title to make more contributors interested in helping out. If you are an open source contributor, ask and see how you can help by commenting or check out more open issues in this repo at https://github.com/code4sac/wicit/issues.

Just doing a little :seedling: open source gardening :seedling: of Brigade projects! For more info/tools for creating civic issues, check out Got Issues Thank you!

josephlei commented 8 years ago

I've been in touch with the WIC team at CDPH last year and then again yesterday.. I have some contacts and will continue to reach out to them regarding their approved UPCs in machine readable format. Don't know if we'll end up using excel but if so, custom formats like 000000000 will add leading zeros as needed.