Open lethabo24 opened 4 years ago
Thanks -- I see digits transposed in the Northwest figures, but can't see the problem in Gauteng: 15176115 corresponds to Figure 1 on page vi. Could you elaborate please.
Good Day
The SA Stats document in the appendix notes the Gauteng Population as 15176116.
Kind Regards Lethabo Maluleke
-------- Original message -------- From: Scott Hazelhurst notifications@github.com Date: Thu, Jun 11, 2020, 10:55 AM To: dsfsi/covid19za covid19za@noreply.github.com Cc: "Maluleke, LM, Miss [18306063@sun.ac.za]" 18306063@sun.ac.za, Author author@noreply.github.com Subject: Re: [dsfsi/covid19za] [DATA] (#436) CAUTION: This email originated from outside of the University. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Thanks -- I see digits transposed in the Northwest figures, but can't see the problem in Gauteng: 15176115 corresponds to Figure 1 on page vi. Could you elaborate please.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdsfsi%2Fcovid19za%2Fissues%2F436%23issuecomment-642509193&data=02%7C01%7C%7C721b86de95de4992d18d08d80de514c1%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637274625321315354&sdata=smzTrm1zlrqkJpM0gTDQUT3TStoygan11dBZmHhuMm0%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAKAA2Q5TLAEZJLMH3GV6E5DRWCLVBANCNFSM4N2YYO3Q&data=02%7C01%7C%7C721b86de95de4992d18d08d80de514c1%7Ca6fa3b030a3c42588433a120dffcd348%7C0%7C0%7C637274625321315354&sdata=CkW5GG1mGnriPwFvKdlaJbDeKV9eqhsuiQgGuQwxL%2B8%3D&reserved=0.
[https://cdn.sun.ac.za/100/ProductionFooter.jpg]https://www.sun.ac.za/english/about-us/strategic-documents
The integrity and confidentiality of this email are governed by these terms. Disclaimerhttps://www.sun.ac.za/emaildisclaimer Die integriteit en vertroulikheid van hierdie e-pos word deur die volgende bepalings bereël. Vrywaringsklousulehttps://www.sun.ac.za/emaildisclaimer
Thanks @18306063
@shaze we also have the statssa midyear estimates now in the staging area folder. We might want to just make a choice on where to put that, maybe
data/official_statistics/
OK -- the one in data/district_data has been there longer so there may be scripts dependant on it. But easy to change so it is more important to have it in the right logical place so I have no objection moving or replacing it
But if using the new file I think needs to be made program friendly -- if you read in with Pandas it seems the columns as text by default, and even harder to handle if not using Pandas
@elolelo Can you comment.
Hi Lethabo
Thanks -- it seems that they've slightly contradictory figures in the same document. Fortunately only off by 1 so way below any error mark (also adding the provincial figures does not give the total figure so we can't check that way to find which is correct)
The NW error is definitely wrong. Will push with today's figures
Will fix and push in few minutes
@elolelo Can you comment.
I am not sure to what extent are these new files program friendly. They may be changed if necessary.
Thanks. Ideally they must be computer-readable -- Pandas is the most flexible so readable by Pandas is essential.
Also for the age break down file, I think having 5 provinces followed by 4 provinces is very difficult fo a computer to follow.
Two possible formats are below. My preference would be for 1 though 2 is what we're doing in other places and may be more human friendly.
Have columns: province, age group, male, female, total
Province is repeated
Using the same format that we're using for keys Have 27 columns, 3 for each province Eastern Cape\tMales,Eastern Cape\tFemales,Eastern Cape\tTotal,Free State\tMales,......
Note using the same convention as we do for district -- spaces separating words in names of provinces and tabs separating the name of the province from the category. This approach is very readable in GitHub, but programs can parse easily and using the convention of tabs separating the province name from the category means that
Final point -- I note in several places that the total is not equal to the sum of males and females. I doubt that these figures were done at time where non-binary categories were allowed so they are likely to be errors (in the source document). It might be worth pointing this out in the README. The discrepancy is so small as to be inconsequential for any work being done.
Many thanks for all this work -- it is very helpful
Which Dataset
The za_province_pop
Error Description
The Gauteng and NorthWest populations do not correspond to the National Statistics PDF document
Suggested fixes
1. 1. 1.