PolicyEngine / synthimpute

Python package for data synthesis and imputation using parametric and nonparametric methods, and evaluation of these methods.
MIT License
11 stars 6 forks source link

Warn when records are dropped from block_cdist because they don't have the same values of blocked variables #11

Open MaxGhenis opened 5 years ago

MaxGhenis commented 5 years ago

In the mpg example:

b2 = si.block_cdist(synth, mpg, ['cylinders', 'model_year'], metric='euclidean')

/home/maxghenis/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/merge.py:963: UserWarning: You are merging on int and float columns where the float values are not equal to their int representation
  'representation', UserWarning)

This is probably due to rf_synth synthesizing model_year as a float (that's the dtype) where it's an int in the original data. This should then go away if warning when records are dropped because they're not part of a block.

MaxGhenis commented 5 years ago

Renaming this issue to get at the root problem