ActivitySim / populationsim

An Open Platform for Population Synthesis
https://activitysim.github.io/populationsim
Other
53 stars 40 forks source link

Cannot convert non-finite values to integer #104

Closed gregmacfarlane closed 4 years ago

gregmacfarlane commented 4 years ago

I'm trying to get an implementation of populationsim up and running. I'm using the populationsim Anaconda environment (on MacOS 10.14.6), with the data files and configuration in this respository. I've been able to grind through many of the errors, but this traceback is something I can't figure out.

Traceback (most recent call last):
  File "run_populationsim.py", line 63, in <module>
    pipeline.run(models=steps, resume_after=resume_after)
  File "/Users/gregmacfarlane/opt/anaconda3/envs/popsim/lib/python3.7/site-packages/activitysim/core/pipeline.py", line 594, in run
    run_model(model)
  File "/Users/gregmacfarlane/opt/anaconda3/envs/popsim/lib/python3.7/site-packages/activitysim/core/pipeline.py", line 471, in run_model
    orca.run([step_name])
  File "/Users/gregmacfarlane/opt/anaconda3/envs/popsim/lib/python3.7/site-packages/activitysim/core/orca.py", line 2034, in run
    step()
  File "/Users/gregmacfarlane/opt/anaconda3/envs/popsim/lib/python3.7/site-packages/activitysim/core/orca.py", line 843, in __call__
    return self._func(**kwargs)
  File "/Users/gregmacfarlane/opt/anaconda3/envs/popsim/lib/python3.7/site-packages/populationsim/steps/setup_data_structures.py", line 340, in setup_data_structures
    = build_grouped_incidence_table(incidence_table, control_spec, seed_geography)
  File "/Users/gregmacfarlane/opt/anaconda3/envs/popsim/lib/python3.7/site-packages/populationsim/steps/setup_data_structures.py", line 230, in build_grouped_incidence_table
    how='left').group_id.astype(int).values
  File "/Users/gregmacfarlane/opt/anaconda3/envs/popsim/lib/python3.7/site-packages/pandas/core/generic.py", line 5882, in astype
    dtype=dtype, copy=copy, errors=errors, **kwargs
  File "/Users/gregmacfarlane/opt/anaconda3/envs/popsim/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 581, in astype
    return self.apply("astype", dtype=dtype, **kwargs)
  File "/Users/gregmacfarlane/opt/anaconda3/envs/popsim/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 438, in apply
    applied = getattr(b, f)(**kwargs)
  File "/Users/gregmacfarlane/opt/anaconda3/envs/popsim/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 559, in astype
    return self._astype(dtype, copy=copy, errors=errors, values=values, **kwargs)
  File "/Users/gregmacfarlane/opt/anaconda3/envs/popsim/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 643, in _astype
    values = astype_nansafe(vals1d, dtype, copy=True, **kwargs)
  File "/Users/gregmacfarlane/opt/anaconda3/envs/popsim/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 700, in astype_nansafe
    "Cannot convert non-finite values (NA or inf) to " "integer"
ValueError: Cannot convert non-finite values (NA or inf) to integer
Closing remaining open files:output/pipeline.h5...done

It's not clear if these NA values are coming from the controls (unlikely) or the seed table (I suppose very likely) or the geographic crosswalk, and if so from which column. Am I on the right track, or is it a different issue entirely?

christian-hunter commented 4 years ago

Running the same repository on Windows causes the program to break earlier with a different error message:

Traceback (most recent call last):
  File "run_populationsim.py", line 63, in <module>
    pipeline.run(models=steps, resume_after=resume_after)
  File "C:\Users\cbh1996\.conda\envs\popsim2\lib\site-packages\activitysim\core\pipeline.py", line 594, in run
    run_model(model)
  File "C:\Users\cbh1996\.conda\envs\popsim2\lib\site-packages\activitysim\core\pipeline.py", line 471, in run_model
    orca.run([step_name])
  File "C:\Users\cbh1996\.conda\envs\popsim2\lib\site-packages\activitysim\core\orca.py", line 2034, in run
    step()
  File "C:\Users\cbh1996\.conda\envs\popsim2\lib\site-packages\activitysim\core\orca.py", line 843, in __call__
    return self._func(**kwargs)
  File "C:\Users\cbh1996\.conda\envs\popsim2\lib\site-packages\populationsim\steps\setup_data_structures.py", line 336, in setup_data_structures
    incidence_table['sample_weight'] = households_df[hh_weight_col]
TypeError: 'NoneType' object does not support item assignment
Closing remaining open files:output\pipeline.h5...done
bettinardi commented 4 years ago

Does 'NoneType' refer to blanks in the seed. In our seed processing, we always file blanks with negative numbers. Does that help? (happy to provide a full seed processing example if that is helpful)

Example:

remove blanks from the household table too

clean up NA fields

hh$HINCP[is.na(hh$HINCP)] <- -8 hh$TEN[is.na(hh$TEN)] <- -8 hh$BLD[is.na(hh$BLD)] <- -8 hh$VEH[is.na(hh$VEH)] <- -8 hh$HHT[is.na(hh$HHT)] <- -8 hh$NPF[is.na(hh$NPF)] <- -8 hh$HUPAC[is.na(hh$HUPAC)] <- -8

gregmacfarlane commented 4 years ago

Such service! And so quick!

So does this mean the seed table can have no NA fields in any column? Or only in the fields we are grabbing?

bettinardi commented 4 years ago

Looking are our current seed input (after being processed), it looks like we have lots of NAs, so I'm thinking only used fields need to be populated.

Again, happy to provide any examples of inputs or processing steps in side emails as would be helpful.

binnympaul commented 4 years ago

Hi @gregmacfarlane , I looked at your seed_households.csv. Looks like you have empty households in there (meaning NP==0). Empty households with no persons must be excluded.

gregmacfarlane commented 4 years ago

Given that most people will be building their seed data from PUMS, do you have a script that converts and cleans PUMS data into the format you are using? Alex suggested above that he has such a script, but I'm wondering if we can just put it straight into the documentation. The documentation doesn't say, for example, that you have to filter out zero-person households rather than the program just knowing to skip over them.