bernardodionisi / differences

difference-in-differences in Python
https://bernardodionisi.github.io/differences/latest/
GNU General Public License v3.0
93 stars 21 forks source link

Package cannot handle NAs in design matrix when utilizing 'split_sample_by' #14

Open johnkohler00 opened 10 months ago

johnkohler00 commented 10 months ago

There is a bug in the following line: https://github.com/bernardodionisi/differences/blob/86b15a2cd3a8235a3287f0cb5dc963a04b504f6f/src/differences/attgt/attgt.py#L339 that causes a Key Error when attempting to utilize the 'split_sample_by' feature on a dataset that contains N/A values in columns of the design matrix. The code is only passing the data[split_sample_by] column rather than full data object, which is the behavior when there are no N/A values. This causes an error later on when parse_split_sample() attemps to index the column again here: https://github.com/bernardodionisi/differences/blob/86b15a2cd3a8235a3287f0cb5dc963a04b504f6f/src/differences/attgt/difference.py#L450 The original line should read else self.data.loc[

johnkohler00 commented 10 months ago

I have submitted a pull request. Please let me know if you agree with this diagnosis of the issue.

bernardodionisi commented 6 months ago

Hi, I apologize I missed this issue, I have not checked in for a while. I will try to take a look at the PR soonish