Closed javitonino closed 5 years ago
This issue will also close CartoDB/Geographica-Product-Coordination/issues/24
In addition to this, we should review the names for the geometries (see CartoDB/Geographica-Product-Coordination/issues/52). For some levels, names are mixed. Example:
"SA1": {
"name": "Statistical Area Level 1",
"weight": 13,
"region_col": "SA1_7DIGIT",
"proper_name": "STATE_NAME"
},
Levels SA1, SA2, SA3, SA4 and MB should be reviewed. If there's a meaningful name for them, it should be used. If there isn't, the id should be used as name.
Levels SA1, SA2, SA3, SA4 and MB should be reviewed
Here's an extract of metadata and sample data in each file
SA1_2016_AUST
SA1_7DIGIT: String (7.0)
SA1_MAIN16: String (11.0)
STATE_CODE: String (1.0)
STATE_NAME: String (50.0)
AREA_SQKM: Real (31.15)
OGRFeature(SA1_2016_AUST):0
SA1_7DIGIT (String) = 1100701
SA1_MAIN16 (String) = 10102100701
STATE_CODE (String) = 1
STATE_NAME (String) = New South Wales
AREA_SQKM (Real) = 362.872700000000009
SA2_2016_AUST
SA2_MAIN: String (9.0)
SA2_MAIN16: String (9.0)
SA2_NAME: String (50.0)
STATE_CODE: String (1.0)
STATE_NAME: String (50.0)
AREA_SQKM: Real (31.15)
OGRFeature(SA2_2016_AUST):0
SA2_MAIN (String) = 101021007
SA2_MAIN16 (String) = 101021007
SA2_NAME (String) = Braidwood
STATE_CODE (String) = 1
STATE_NAME (String) = New South Wales
AREA_SQKM (Real) = 3418.352499999999964
SA3_2016_AUST
SA3_CODE: String (5.0)
SA3_CODE16: String (5.0)
SA3_NAME: String (50.0)
STATE_CODE: String (1.0)
STATE_NAME: String (50.0)
AREA_SQKM: Real (31.15)
OGRFeature(SA3_2016_AUST):0
SA3_CODE (String) = 12401
SA3_CODE16 (String) = 12401
SA3_NAME (String) = Blue Mountains
STATE_CODE (String) = 1
STATE_NAME (String) = New South Wales
AREA_SQKM (Real) = 942.407900000000041
SA4_2016_AUST
SA4_CODE: String (3.0)
SA4_CODE16: String (3.0)
SA4_NAME: String (50.0)
STATE_CODE: String (1.0)
STATE_NAME: String (50.0)
AREA_SQKM: Real (31.15)
OGRFeature(SA4_2016_AUST):0
SA4_CODE (String) = 212
SA4_CODE16 (String) = 212
SA4_NAME (String) = Melbourne - South East
STATE_CODE (String) = 2
STATE_NAME (String) = Victoria
AREA_SQKM (Real) = 1922.280500000000075
MB_2016_NSW
MB_CODE16: String (11.0)
MB_CAT16: String (30.0)
SA1_MAIN16: String (11.0)
SA1_7DIG16: String (28.0)
SA2_MAIN16: String (9.0)
SA2_5DIG16: String (20.0)
SA2_NAME16: String (50.0)
SA3_CODE16: String (5.0)
SA3_NAME16: String (50.0)
SA4_CODE16: String (3.0)
SA4_NAME16: String (50.0)
GCC_CODE16: String (5.0)
GCC_NAME16: String (50.0)
STE_CODE16: String (3.0)
STE_NAME16: String (50.0)
AREASQKM16: Real (31.15)
OGRFeature(MB_2016_NSW):0
MB_CODE16 (String) = 10000009499
MB_CAT16 (String) = NOUSUALRESIDENCE
SA1_MAIN16 (String) = 19999949999
SA1_7DIG16 (String) = 1949999
SA2_MAIN16 (String) = 199999499
SA2_5DIG16 (String) = 19499
SA2_NAME16 (String) = No usual address (NSW)
SA3_CODE16 (String) = 19999
SA3_NAME16 (String) = No usual address (NSW)
SA4_CODE16 (String) = 199
SA4_NAME16 (String) = No usual address (NSW)
GCC_CODE16 (String) = 19499
GCC_NAME16 (String) = No usual address (NSW)
STE_CODE16 (String) = 1
STE_NAME16 (String) = New South Wales
AREASQKM16 (Real) = 0.000000000000000
Closing since this is already merged and deployed to staging. I've left the PR in the backend kanban in "pending deploy".
There are several tables with columns with the same name but different data. Since we treat them with the same column id, we get inconsistent data. An example:
The column
P_Tot_Tot
(total population) is available in several tables. Let's look at three examples:G04 AGE BY SEX
it means the total populationG17 TOTAL PERSONAL INCOME (WEEKLY) BY AGE BY SEX
it means total population 15 years or olderG46 NON-SCHOOL QUALIFICATION: LEVEL OF EDUCATION(a) BY AGE BY SEX
, it means the total population 15 years or older with some qualification (aside from mandatory school).As you can imagine, these numbers are very different. Since for us, this column is the same, we have a mixup of data. In particular, the column that we offer depends on the database order of rows when generation
obs_meta
, so we are not even consistent. Denominators are also crazy because of this, as you can imagine.We need to review the generation data so: