Upgraded Extract - Githubissues

gazetteerhk / census_explorer

Explore Hong Kong's neighborhoods through visualizations of census data

http://gazetteer.hk

MIT License

42 stars 12 forks source link

Upgraded Extract #14

Closed hupili closed 10 years ago

hupili commented 10 years ago

This addresses multiple correlated issues. I'll ping back in followup comments. A major refactoring was done on this branch.

data_preparation.py is the current entrance point. Take a look at the intermediate and final output: http://hupili.net/projects/hk_census/data/ I suppose this is what we need, except for some details. Two in my mind:

identifier (for database).
raw name to canonical name created by @clacanzo . Need more people to review the mappings.

Suggest to followup the two on separate issues.

hxu commented 10 years ago

@hupili Do you want to merge these? Feel free to go ahead and do so on master if it is ready to go, no need to issue a pull request.

hupili commented 10 years ago

PR is used as a call for code review. Not only quality per se, but also for each one to know where we are. Just some previous practise. Though, we can omit this.

For this particular case, it's better to have someone else try running data_preparation.py, because it's a major refactoring.

hxu commented 10 years ago

Ok I'll take a look later tonight.

Agreed we should use pulls once we get stable. On Jan 30, 2014 3:11 PM, "HU, Pili" notifications@github.com wrote:

PR is used as a call for code review. Not only quality per se, but also for each one to know where we are. Just some previous practise. Though, we can omit this.

For this particular case, it's better to have someone else try running data_preparation.py, because it's a major refactoring.

— Reply to this email directly or view it on GitHubhttps://github.com/hxu/hk_census_explorer/pull/14#issuecomment-33665005 .

hxu commented 10 years ago

@hupili at what point is the spreadsheet for raw to canonical name being pulled in? Are you planning on using that spreadsheet to update translation_fix.py and table_meta_data.py once it is finalized?

hxu commented 10 years ago

I've also reviewed the spreadsheet. Thanks @clacanzo for your help with that. Most of my changes were formatting (there were some characters that looked like spaces but weren't actually spaces).

Some notes on the style I tried to implement:

Uppercase only for first word ("Household income" not "Household Income") unless proper noun
Spaces between dividing characters (dashes, slashes, etc.)

I also added a column F and wrote "REMOVE ROW" if the row was an empty row that should not be included in the final results, usually caused by line breaks.

Still some items on the mappings that I am unsure of, so would be good to get another pair of eyes on them too, if anyone wants to review again.

@hupili back to you?

hxu commented 10 years ago

I added a sheet to the spreadsheet that lists special aggregate cells that I found, and estimates the number of data points that we should have in the final dataset.

hupili commented 10 years ago

Immediate problems are fixed. Longer term problems are redirected. Merge to master as extractor baseline

hxu commented 10 years ago

:+1: