Build an automated processing pipeline

Gorcenski commented 6 years ago

The data processing pipeline should perform the following steps:

[x] Take source data and export it into a more convenient format
- Process the Geo data to extract street names and export them into a usable format;
- Filter the exported data and break it down by gender;
- Output the results in a format that clearly identifies gendered location names and non-gendered location names and puts the data in a format suitable for further development.
[ ] Generate human-editable content from this data
[ ] Be able to push edits back into data store
[ ] Have a strategy for handling conflicts

This can probably be accomplished with the development of a shell script and a basic python script. I haven't yet decided on the output format yet, so that remains to be determined.

Gorcenski commented 6 years ago

I've added a script that will extract the data from the OSM files and put it into a tab delimited format. Next step is to write a python script that will do some data handling and output it to JSON or some other slightly more workable format, and then integrate this script into the extraction pipeline.

I've also decided to put both source and processed files into the data folder in the repo. By including the source files, the user can work the pipeline themselves, and by including the processed files, they won't have to.

Gorcenski commented 6 years ago

Updated this with a more comprehensive checklist about what this pipeline entails

Gorcenski / women-streets-berlin

Build an automated processing pipeline #6