georgelu / directory

sachacks food resources project
0 stars 1 forks source link

Investigate Data Cleaning Requirements #8

Open georgelu opened 8 years ago

georgelu commented 8 years ago

I'll take a look at the data and sketch out some key fields by tomorrow evening.

For instance: structure recurring events, handling of one-off/special events, other requirements.

ashander commented 8 years ago

OK, I think this references #4

I understand the broad issue, but not the details. Cleaning the data in a one-off way makes sense for this prototype.

For longer-term sustainability of this idea, it'd be good to think about how folks from non-profit community or similar on the ground will be able to maintain/alter data that feeds into the app.

Perhaps if they provided a little more structure in their data on operating hours, we could provide a reusable solution to pipe data from csv to the schema?

ashander commented 8 years ago

It seems like this also involves a discussion of overall design. @georgelu maybe this issue could serve as a stub to have that discussion?

( I had just created another issue #23 for discussing the data flow, but I think it's better to just have that discussion here.)

Main goals:

ashander commented 8 years ago

As an alternative to cleaning and preprocessing, it might be possible to do everything in-browser from a CSV.

For example, using http://papaparse.com/ but this would require writing javascript to do all the steps outlined in the data cleaning section of the readme

georgelu commented 8 years ago

Typical cases:

Atypical cases:

Hypothetical cases: (which are likely not worth supporting right now)

Desirable Data: (all should be both human and machine readable, and one field may require two JSON rows)

Optional data:

georgelu commented 8 years ago

To clarify, by pre-processing, do you mean manual cleaning/processing?

One thing to keep in mind is that data must be both human and machine readable. For instance, we want to cleanly display a site's address, but the Google map API may work best with coordinates or other non-trivial conversions. Another example: dates need to be easily comparable and sortable while sometimes complex recurrence logic needs to be clearly explained to users.

Based on my relative skill with JS/Python, I'd favor adding more fields to the JSON output rather than do further procession on the browser side. I'm not sure about the precise technical tradeoffs and can happily try to work either way.

ashander commented 8 years ago

great stuff. yes, I meant avoiding manual processing. overall, I agree it makes sense to do processing with Python to clean json. Pushing forward on that will help us to see if there's some additional structure on the human readable side (be that wiki, google doc, or spreadsheet) that could make our task of using it programmatically easier. more tomorrow, - Jaime

ashander commented 8 years ago

wxdatetime classes have some pretty powerful parsers capable of what we need. eg, lhttp://docs.wxwidgets.org/trunk/classwx_date_time.html#a4687372ebe55a6aded83de6a639cde95