Open adieyal opened 6 years ago
Now that we have descriptions of the data tables inside Django models, we can probably add a field which describes the sort order. eg. "strip all non-numbers and sort numerically" vs alphabetical.
don't you think that a canonical order would be a less hacky solution, i.e. we can add ordering to the model?
On Wed, Oct 3, 2018 at 9:34 AM Greg Kempe notifications@github.com wrote:
Now that we have descriptions of the data tables inside Django models, we can probably add a field which describes the sort order. eg. "strip all non-numbers and sort numerically" vs alphabetical.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/OpenUpSA/wazimap-za/issues/301#issuecomment-426539636, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZrslWzqlwPXQqpmT0GYWa5913Jo9c8ks5uhGh4gaJpZM4XENgx .
-- Adi Eyal Director OpenUp Promoting informed decision-making
phone: +27 78 014 2469 skype: adieyalcas linkedin: http://za.linkedin.com/pub/dir/Adi/Eyal web: http://www.openup.org.za twitter: @soapsudtycoon
Follow us on twitter at @OpenUpSA Sign up for Naked Data, our weekly round-up of what's happening locally and internationally in data journalism and data visualisation. Register here: https://openup.org.za/nakeddata.html Also check out our data storytelling short courses: https://openup.org.za/trainup/ https://openup.org.za/courses.html
Maybe I'm misunderstanding what you mean by canonical ordering. Assuming you mean "any ordering, as long as we have one and its consistently used", I think we need to be a bit smarter, in particular with labels that have numbers. Here's what an alphabetical ordering of this sort of dataset looks like:
No income
Not applicable
R 1 228 801 - R 2 457 600
R 1228801 - R 2457600
R 153601 - R 307200
R 153 801 - R 307 600
R 19201 - R 38400
R 19 601 - R 38 200
R 1 - R 4800
R 2 457 601 or more
R2457601 or more
R 307201 - R 614400
R 307 601 - R 614 400
R 38 201 - R 76 400
R 38401 - R 76800
R 4801 - R 9600
R 614 001 - R 1 228 800
R 614401- R 1228800
R 76 401 - R 153 800
R 76801 - R 153600
R 9601 - R 19200
R 9601 - R 19 600
Unspecified
I don't think it's useful to the user, it looks arbitrary. We're also at the mercy of the original source's categories -- so if they have inconsistent spacing, we'll get inconsistent results.
I suggest that for each model we default to an alphabetical ordering, with an option of switching to a numerical ordering (which we can use on this dataset). Here's an example of what that could result in:
R 1 - R 4800
R 4801 - R 9600
R 9601 - R 19200
R 9601 - R 19 600
R 19201 - R 38400
R 19 601 - R 38 200
R 38 201 - R 76 400
R 38401 - R 76800
R 76 401 - R 153 800
R 76801 - R 153600
R 153601 - R 307200
R 153 801 - R 307 600
R 307201 - R 614400
R 307 601 - R 614 400
R 614 001 - R 1 228 800
R 614401- R 1228800
R 1 228 801 - R 2 457 600
R 1228801 - R 2457600
R 2 457 601 or more
R2457601 or more
No income
Not applicable
Unspecified
If we add any sort of server-side ordering, we'll need to adjust the API to allow the client to benefit from it (eg. by including an ordering
list or something). Currently the API returns columns in a JSON object, which has unspecified ordering.
Currently there is no explicit ordering at all.
https://wazimap.co.za/data/distribution/?table=ANNUALHOUSEHOLDINCOME_GENDEROFHOUSEHOLDHEAD&primary_geo_id=ward-79900061&geo_ids=ward|municipality-TSH&release=2011
Currently the values for a particular indicator are sorted by alphabetical value - this is annoying if the number is numeric, e.g. household income. There should be a canonical sort order for every indicator