Closed james-westwood closed 3 years ago
@jwestw to speak to SDG data team about which age brackets to use.
Decided to use 5 year age groups.
Explore other groupings to see if anything stands out as interesting.
I found some strange data after isolating the the counts of people (from census data) by age.It appeared because out of many thousands of values it is the only one with a comma in it and wouldn't convert to an integer.
The problem is in column 19 (for 19 years old), row 3091. Value is 1021.
The value is much higher than those around it and other typical values I've seen.
So I have forced it to convert, and have plotted the counts of the integer values.
And this is the plot, hovering over the problem value:
This outlier looks like it might be erroneous.
I tested the method that I created and am quite happy with it efficiency.
And dropping the original columns gets the dataframe I want.
Check if functions exist to bucket-ize values If no function exists, write a function to bucketize the age values. Looking at other SDG indicators, buckets should be:
From Indicator 3.6.1
And from Indicator 3.4.2
These do not agree with each other. Need to ask.