Leeds-MRG / Minos

SIPHER Microsimulation for estimating the effect on Income policy on mental health.
MIT License
4 stars 3 forks source link

Find general solution to Vivarium/Pandas casting columns to unwanted types #352

Open paddy-r opened 10 months ago

paddy-r commented 10 months ago

Vivarium raises a very annoying error, e.g.

vivarium.framework.population.exceptions.PopulationError: A component is corrupting the population table by modifying the dtype of the low_income column from float64 to int64.

Encountered during development of child poverty variables and metrics, but also previously by Luke during PCS development. Fixed there by manually casting to the correct types in two places (RunPipeline and generate_repl_pop). Also fixed temporarily in child poverty intervention by trial and error (urg).

Issue is to review methods once child poverty is merged with development (or is at least closer to being so) to make sure functionality is good for all purposes.

paddy-r commented 9 months ago

Imperfect fix for poverty variables in e65cb0f8825ad45000b63ea3bfe581c93ad5ac75 but general solution would be good. Also encountered when adding heating module; casting to int in US_complete_case for now. Also copied type_check from different branch via Luke (see 5fb605d3846c7f24523b2f0b53f25ea99b38c661) but needs checking when branch 285 is merged with 283 and development. All places to watch:

RobertClay commented 9 months ago

solution long term is type checking of the input and replenishment populations on startup. they should be cast there and consistent. Makes it easier if something goes wrong in minos to determine which type is needed.

https://www.tutorialspoint.com/how-to-check-the-data-type-in-pandas-dataframe

check dtypes(data) == list_of_required_dtypes.

its tedious but very useful.

paddy-r commented 9 months ago

solution long term is type checking of the input and replenishment populations on startup. they should be cast there and consistent. Makes it easier if something goes wrong in minos to determine which type is needed.

https://www.tutorialspoint.com/how-to-check-the-data-type-in-pandas-dataframe

check dtypes(data) == list_of_required_dtypes.

its tedious but very useful.

It was indeed the input population with heating.