DataUSA / datausa-tracker

0 stars 0 forks source link

B24010 config fix and rerun #255

Closed hwchen closed 4 years ago

hwchen commented 5 years ago

For the config B24010, there's a few labels that were cut off and thus didn't match what it should have been:

        "PreschoolAndKind",
        "ElementaryAndMid",
        "SecondarySchoolT",
        "SpecialEducation",
        "OtherHealthDiagnosingAndTreatingPractitionersAndTechn",

So, these had correct versions when labelled for Male, but not for Female.

This resulted in an incorrect number of members for occupation. It should be 114, but ended up as 119. And then because some rows had an id over 114 but the dim only went up to 114, those rows were not available when drilling down own occupation.

The fix is to match these incorrect labels to be the same as the correct labels and to rerun the etl. That will fix the id problem.

Then on the inline table side in the mondrian schema, no change is needed to fix the hiding of rows issue (since the etl fix will bring the ids in line with what's correct). Just to keep an eye on labels that could be manually fixed, since some might be cut off (I checked manually before, but I may have missed some).

hwchen commented 5 years ago

Fixed on new flint-backend, in db and with restarted datausa-cube service.

curl "localhost:5000/cubes/acs_ygso_gender_by_occupation_1/aggregate.csv?drilldown%5B%5D=%5BACS+Occupation%5D.%5BOccupation%5D&cut%5B%5D=%5BGeography%5D.%5BCounty%5D.%5BCounty%5D.%26%5B05000US17031%5D&cut%5B%5D=%5BYear%5D.%5BYear%5D.%5BYear%5D.%26%5B2017%5D&measures%5B%5D=Workforce+by+Occupation+and+Gender&nonempty=true&distinct=false&parents=false&debug=true&sparse=true"

For the above call (drilldown on Occupation, cut by Cook County IL and 2017), I verified that the count was 2562770, which is the count without drilling down on Occupation.

mochi:~ > xsv stats occupation-test.csv | xsv table
field                               type     sum      min                     max          min_length  max_length  mean                stddev
ID Occupation                       Integer  6555     0                       114          1           3           57                  33.19638534539567
Occupation                          Unicode           Accountants & Auditors  Woodworkers  8           132                             
Workforce by Occupation and Gender  Float    2562770  57                      153934       4           8           22284.956521739132  21932.51173997626

I also visually verified that the member ids in the table are correct. @jspeis @davelandry