Closed Gorcenski closed 6 years ago
Ok, this step took a lot of digging into OSM data formats, which was eventually required, and I was able to put together a jupyter notebook that can do this. I'll put that code into a script.
Right now I'm just extracting street names and correlating it with gender info. Per-street geo data will be a different thing and will be slightly more complex, as the OSM data requires a bit more JB Weld to get the data how I'll want it.
For the initial work, this is completed in https://github.com/Gorcenski/women-streets-berlin/commit/7955c6df0d990e0feb644ff459f89e0e1964816e
The pipeline is basically:
Download source Geo and name-gender data > process Geo data > merge with name-gender data > place-gender correlation data.
For this last step, a python script will do the place-gender correlation and will output the data in a more generalized data model, in this case, a JSON file that can be used to further generate the data in a more suitable markdown file or something similar.
This will be the final step in the automated extraction pipeline.