Closed hwchen closed 4 years ago
Fixed on new flint-backend
, in db and with restarted datausa-cube service.
curl "localhost:5000/cubes/acs_ygso_gender_by_occupation_1/aggregate.csv?drilldown%5B%5D=%5BACS+Occupation%5D.%5BOccupation%5D&cut%5B%5D=%5BGeography%5D.%5BCounty%5D.%5BCounty%5D.%26%5B05000US17031%5D&cut%5B%5D=%5BYear%5D.%5BYear%5D.%5BYear%5D.%26%5B2017%5D&measures%5B%5D=Workforce+by+Occupation+and+Gender&nonempty=true&distinct=false&parents=false&debug=true&sparse=true"
For the above call (drilldown on Occupation, cut by Cook County IL and 2017), I verified that the count was 2562770
, which is the count without drilling down on Occupation.
mochi:~ > xsv stats occupation-test.csv | xsv table
field type sum min max min_length max_length mean stddev
ID Occupation Integer 6555 0 114 1 3 57 33.19638534539567
Occupation Unicode Accountants & Auditors Woodworkers 8 132
Workforce by Occupation and Gender Float 2562770 57 153934 4 8 22284.956521739132 21932.51173997626
I also visually verified that the member ids in the table are correct. @jspeis @davelandry
For the config B24010, there's a few labels that were cut off and thus didn't match what it should have been:
So, these had correct versions when labelled for Male, but not for Female.
This resulted in an incorrect number of members for occupation. It should be 114, but ended up as 119. And then because some rows had an id over 114 but the dim only went up to 114, those rows were not available when drilling down own occupation.
The fix is to match these incorrect labels to be the same as the correct labels and to rerun the etl. That will fix the id problem.
Then on the inline table side in the mondrian schema, no change is needed to fix the hiding of rows issue (since the etl fix will bring the ids in line with what's correct). Just to keep an eye on labels that could be manually fixed, since some might be cut off (I checked manually before, but I may have missed some).