ActivitySim / populationsim

An Open Platform for Population Synthesis
https://activitysim.github.io/populationsim
Other
53 stars 40 forks source link

incidence_table with nan values #185

Open LotteNotelaers opened 2 weeks ago

LotteNotelaers commented 2 weeks ago

Dear,

I get an error running the setup_data_structures.py step. Specifically, when running the def build_incidence_table line 87.

This is the incidence dataframe (result line 85) image

If the next line is run (line 87): incidence_table[control_row.target] = incidence the result is: image So it looks like the numbers are not transferred to the incidence_table dataframe.

Can you help me with this issue?

Kind regards, Lotte

bettinardi commented 2 weeks ago

I'm guessing there's either a configuration file inconsistency or a crosswalk file inconsistency. Would you like to share your configuration yaml and/or your geography crosswalk file

LotteNotelaers commented 2 weeks ago

Hi bettinardi,

Thanks for your help. Here are the files: geo_cross_walk.csv settings.zip I needed to zip the settings.yaml because this file type is not supported by Github.

Kind regards, Lotte

LotteNotelaers commented 1 week ago

Hi,

I think it has to do with the household_df indices being a string and the hh_id column in the person_df being of mixed type.

In the input seed data, the SERIALNO (=hh_id) column is of mixed type, both int and str are in that column. I found that you can specify the dtypes in the settings.yaml. This makes sure they are consistently recognized as strings when reading the csv files. This resolved the problem.

image

Thanks for the help!

LotteNotelaers commented 1 week ago

Hi,

In line 242 in setup_data_structures.py I get an error now because it tries to set the type of the hh_id to int.

household_groups[household_id_col] = household_groups.index.astype(int)

{OverflowError}Python int too large to convert to C long

This doesn`t work because the hh_id contains numbers but also sometimes letters.

What would be the best way to resolve this?

Kind regards, Lotte

bettinardi commented 1 week ago

add new household IDs to the seed data and make sure the new IDs only contain numbers

HH_ID is different than a PUMS serial number