PSLmodels / Tax-Brain

Tax-Brain is an integrator model for PSL tax models
http://taxbrain.pslmodels.org/
MIT License
8 stars 12 forks source link

C/S update error #188

Open hdoupe opened 2 years ago

hdoupe commented 2 years ago

I'm getting this error when trying to update Tax-Brain for Compute Studio:

ValueError: Must have equal len keys and value when setting with an iterable
Full stack trace ``` _______________________________________________________________________________ TestFunctions1.test_run_model ________________________________________________________________________________ self = def test_run_model(self): self.test_all_data_specified() inputs = self.get_inputs({}) check_get_inputs(inputs) class MetaParams(Parameters): defaults = inputs["meta_parameters"] mp_spec = MetaParams().specification(serializable=True) > result = self.run_model(mp_spec, self.ok_adjustment) ../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/cs_kit/validate.py:221: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ cs-config/cs_config/functions.py:150: in run_model results = compute(*delayed_list) ../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/dask/base.py:568: in compute results = schedule(dsk, keys, **kwargs) ../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/dask/threaded.py:79: in get results = get_async( ../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/dask/local.py:517: in get_async raise_exception(exc, tb) ../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/dask/local.py:325: in reraise raise exc ../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/dask/local.py:223: in execute_task result = _execute_task(task, data) ../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/dask/core.py:121: in _execute_task return func(*(_execute_task(a, cache) for a in args)) cs-config/cs_config/helpers.py:187: in nth_year_results agg1, agg2 = fuzzed(dv1, dv2, reform_affected, 'aggr') cs-config/cs_config/helpers.py:158: in fuzzed group2.iloc[idx] = group1.iloc[idx] ../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/pandas/core/indexing.py:723: in __setitem__ iloc._setitem_with_indexer(indexer, value, self.name) ../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/pandas/core/indexing.py:1730: in _setitem_with_indexer self._setitem_with_indexer_split_path(indexer, value, name) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = , indexer = (223671, slice(None, None, None)) value = c62100 9009.470531 aftertax_income 10535.143557 payrolltax 1378.448991 benefit_va... 1.000000 benefit_cost_total 0.000000 s006 231.800000 Name: 223671, dtype: float64 name = 'iloc' def _setitem_with_indexer_split_path(self, indexer, value, name: str): """ Setitem column-wise. """ # Above we only set take_split_path to True for 2D cases assert self.ndim == 2 if not isinstance(indexer, tuple): indexer = _tuplify(self.ndim, indexer) if len(indexer) > self.ndim: raise IndexError("too many indices for array") if isinstance(indexer[0], np.ndarray) and indexer[0].ndim > 2: raise ValueError(r"Cannot set values with ndim > 2") if (isinstance(value, ABCSeries) and name != "iloc") or isinstance(value, dict): from pandas import Series value = self._align_series(indexer, Series(value)) # Ensure we have something we can iterate over info_axis = indexer[1] ilocs = self._ensure_iterable_column_indexer(info_axis) pi = indexer[0] lplane_indexer = length_of_indexer(pi, self.obj.index) # lplane_indexer gives the expected length of obj[indexer[0]] # we need an iterable, with a ndim of at least 1 # eg. don't pass through np.array(0) if is_list_like_indexer(value) and getattr(value, "ndim", 1) > 0: if isinstance(value, ABCDataFrame): self._setitem_with_indexer_frame_value(indexer, value, name) elif np.ndim(value) == 2: self._setitem_with_indexer_2d_value(indexer, value) elif len(ilocs) == 1 and lplane_indexer == len(value) and not is_scalar(pi): # We are setting multiple rows in a single column. self._setitem_single_column(ilocs[0], value, pi) elif len(ilocs) == 1 and 0 != lplane_indexer != len(value): # We are trying to set N values into M entries of a single # column, which is invalid for N != M # Exclude zero-len for e.g. boolean masking that is all-false if len(value) == 1 and not is_integer(info_axis): # This is a case like df.iloc[:3, [1]] = [0] # where we treat as df.iloc[:3, 1] = 0 return self._setitem_with_indexer((pi, info_axis[0]), value[0]) raise ValueError( "Must have equal len keys and value " "when setting with an iterable" ) elif lplane_indexer == 0 and len(value) == len(self.obj.index): # We get here in one case via .loc with a all-False mask pass elif len(ilocs) == len(value): # We are setting multiple columns in a single row. for loc, v in zip(ilocs, value): self._setitem_single_column(loc, v, pi) elif len(ilocs) == 1 and com.is_null_slice(pi) and len(self.obj) == 0: # This is a setitem-with-expansion, see # test_loc_setitem_empty_append_expands_rows_mixed_dtype # e.g. df = DataFrame(columns=["x", "y"]) # df["x"] = df["x"].astype(np.int64) # df.loc[:, "x"] = [1, 2, 3] self._setitem_single_column(ilocs[0], value, pi) else: > raise ValueError( "Must have equal len keys and value " "when setting with an iterable" ) E ValueError: Must have equal len keys and value when setting with an iterable ../miniconda3/envs/taxbrain-dev/lib/python3.9/site-packages/pandas/core/indexing.py:1808: ValueError ```

It looks like the error is being thrown when assigning the values of one dataframe to another one (line 158):

https://github.com/PSLmodels/Tax-Brain/blob/2c63597d8132e79edff64c62f481fdf8d5e60149/cs-config/cs_config/helpers.py#L150-L159

I verified that all other tests are passing locally, too.

I verified that the correct PUF file is being downloaded by downloading a new copy of the PUF and then testing it against the copy in the S3 bucket:

image

hdoupe commented 2 years ago

It looks like the group2 dataframe has an extra column that's not in group1: reform_affected. If we drop this, I think we should be good to go. @andersonfrailey thoughts?

image

andersonfrailey commented 2 years ago

@hdoupe

If we drop this, I think we should be good to go

I agree. I went back a bit and it looks like that column was added three years ago. Unclear why it's just throwing an error now, but dropping it should fix the problem.

hdoupe commented 2 years ago

Ok, cool. Thanks for taking a look. I'll open a PR