Open georgelu opened 8 years ago
OK, I think this references #4
I understand the broad issue, but not the details. Cleaning the data in a one-off way makes sense for this prototype.
For longer-term sustainability of this idea, it'd be good to think about how folks from non-profit community or similar on the ground will be able to maintain/alter data that feeds into the app.
Perhaps if they provided a little more structure in their data on operating hours, we could provide a reusable solution to pipe data from csv to the schema?
It seems like this also involves a discussion of overall design. @georgelu maybe this issue could serve as a stub to have that discussion?
( I had just created another issue #23 for discussing the data flow, but I think it's better to just have that discussion here.)
Main goals:
As an alternative to cleaning and preprocessing, it might be possible to do everything in-browser from a CSV.
For example, using http://papaparse.com/ but this would require writing javascript to do all the steps outlined in the data cleaning section of the readme
Typical cases:
Atypical cases:
Hypothetical cases: (which are likely not worth supporting right now)
Desirable Data: (all should be both human and machine readable, and one field may require two JSON rows)
Optional data:
To clarify, by pre-processing, do you mean manual cleaning/processing?
One thing to keep in mind is that data must be both human and machine readable. For instance, we want to cleanly display a site's address, but the Google map API may work best with coordinates or other non-trivial conversions. Another example: dates need to be easily comparable and sortable while sometimes complex recurrence logic needs to be clearly explained to users.
Based on my relative skill with JS/Python, I'd favor adding more fields to the JSON output rather than do further procession on the browser side. I'm not sure about the precise technical tradeoffs and can happily try to work either way.
great stuff. yes, I meant avoiding manual processing. overall, I agree it makes sense to do processing with Python to clean json. Pushing forward on that will help us to see if there's some additional structure on the human readable side (be that wiki, google doc, or spreadsheet) that could make our task of using it programmatically easier. more tomorrow, - Jaime
wxdatetime classes have some pretty powerful parsers capable of what we need. eg, lhttp://docs.wxwidgets.org/trunk/classwx_date_time.html#a4687372ebe55a6aded83de6a639cde95
I'll take a look at the data and sketch out some key fields by tomorrow evening.
For instance: structure recurring events, handling of one-off/special events, other requirements.