Some of the integer fields in the CHTS source data are coded as floats, which is problematic when they're identifiers or categorical variables.
I verified that the decimal values in these fields weren't encoding any information, and added code to convert them immediately to integers in the data prep notebook. Then I updated the other demo notebooks to use the new, cleaner data, and resolved some other outstanding issues related to int/float discrepancies.
This PR also includes new sections in the prediction demo notebook that illustrate calculating probabilities and summed probabilities.
Coverage remained the same at 52.805% when pulling 052c413a52a166302dae39944110e2c819066ea1 on prediction-demo into e1e32cd41a679fa8cd2135a5a8f558972e634b67 on master.
Some of the integer fields in the CHTS source data are coded as floats, which is problematic when they're identifiers or categorical variables.
I verified that the decimal values in these fields weren't encoding any information, and added code to convert them immediately to integers in the data prep notebook. Then I updated the other demo notebooks to use the new, cleaner data, and resolved some other outstanding issues related to int/float discrepancies.
This PR also includes new sections in the prediction demo notebook that illustrate calculating probabilities and summed probabilities.
(@arezoo-bz)