bernardodionisi / differences

difference-in-differences in Python
https://bernardodionisi.github.io/differences/latest/
GNU General Public License v3.0
89 stars 18 forks source link

Having error when clustering standard errors #4

Open assamidanov opened 1 year ago

assamidanov commented 1 year ago

Hi. I am having this error when I use cluster_var. The variable is str. What could be the issue? I would greatly appreciate your help. Thanks in advance.


AttributeError Traceback (most recent call last)

in () ----> 1 att_g.fit( 2 formula='####', control_group = 'never_treated', cluster_var = 'user') ~/.cache/pypoetry/virtualenvs/python-kernel-OtKFaj5M-py3.9/lib/python3.9/site-packages/differences/attgt/attgt.py in fit(self, formula, weights_name, control_group, base_delta, est_method, as_repeated_cross_section, boot_iterations, random_state, alpha, cluster_var, split_sample_by, n_jobs, backend, progress_bar) 674 cluster_groups = None 675 if cluster_var: --> 676 cluster_groups = get_cluster_groups( 677 data=( 678 self._data_matrix[cluster_var] ~/.cache/pypoetry/virtualenvs/python-kernel-OtKFaj5M-py3.9/lib/python3.9/site-packages/differences/attgt/mboot.py in get_cluster_groups(data, cluster_var) 178 raise ValueError("can't have more than 2 cluster variables") 179 --> 180 if find_time_varying_covars(data=data, covariates=cluster_var): 181 raise ValueError("can't have time-varying cluster variables") 182 ~/.cache/pypoetry/virtualenvs/python-kernel-OtKFaj5M-py3.9/lib/python3.9/site-packages/differences/tools/panel_utility.py in find_time_varying_covars(data, covariates, rtol, atol) 346 347 if rtol is None and atol is None: --> 348 varying = data.groupby([entity_name])[covariates].nunique().max(axis=0) 349 return list(varying[varying > 1].index) 350 ~/.cache/pypoetry/virtualenvs/python-kernel-OtKFaj5M-py3.9/lib/python3.9/site-packages/pandas/core/base.py in __getitem__(self, key) 236 237 if isinstance(key, (list, tuple, ABCSeries, ABCIndex, np.ndarray)): --> 238 if len(self.obj.columns.intersection(key)) != len(set(key)): 239 bad_keys = list(set(key).difference(self.obj.columns)) 240 raise KeyError(f"Columns not found: {str(bad_keys)[1:-1]}") ~/.cache/pypoetry/virtualenvs/python-kernel-OtKFaj5M-py3.9/lib/python3.9/site-packages/pandas/core/generic.py in __getattr__(self, name) 5573 ): 5574 return self[name] -> 5575 return object.__getattribute__(self, name) 5576 5577 def __setattr__(self, name: str, value) -> None: AttributeError: 'Series' object has no attribute 'columns'
bernardodionisi commented 1 year ago

Hi Anuar, thanks for reporting this! I'll check and get back to you asap.

bernardodionisi commented 1 year ago

I may need to fix something for a second level of clustering, at the moment I am not sure when I'll have the time to do that. I am guessing you are trying to cluster by a variable that is not your entity, right? If it's your entity, then when bootstrapping the clustering is on entity by default.

assamidanov commented 1 year ago

Yes, I am clustering by the entity. Thanks for the prompt response.