almost-matching-exactly / DAME-FLAME-Python-Package

A Python Package providing two algorithms, DAME and FLAME, for fast and interpretable treatment-control matches of categorical data
https://almost-matching-exactly.github.io/DAME-FLAME-Python-Package/
MIT License
57 stars 14 forks source link

Error in ATE estimation #30

Closed xlim7 closed 2 years ago

xlim7 commented 3 years ago

Hi, I'm trying FLAME for the first time and encountered an error during post-processing of the ATE:

Code snippet:

model = dame_flame.matching.FLAME(
    repeats=True, 
    verbose=3, 
    adaptive_weights="decisiontree", 
    stop_unmatched_t=True, 
    early_stop_un_t_frac=0.005, 
    missing_holdout_replace=0, 
    want_pe=True,
    want_bf=True,
)
model.fit(holdout_data=df, treatment_column_name="treated", outcome_column_name="outcome")
result = model.predict(df)
dame_flame.utils.post_processing.ATE(model)

Error message:

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-10-b4ec8bd0f432> in <module>
----> 1 dame_flame.utils.post_processing.ATE(model)

~/Library/Caches/pypoetry/virtualenvs/pandata-ml-deal-optimisation--RFgPKiW-py3.7/lib/python3.7/site-packages/dame_flame/utils/post_processing.py in ATE(matching_object, mice_iter)
    161         treated = group_data.loc[group_data[matching_object.treatment_column_name] == 1]
    162         control = group_data.loc[group_data[matching_object.treatment_column_name] == 0]
--> 163         avg_treated = sum(treated[matching_object.outcome_column_name]) / len(treated.index)
    164         avg_control = sum(control[matching_object.outcome_column_name]) / len(control.index)
    165         cates[group_id] = avg_treated - avg_control

ZeroDivisionError: division by zero

Is it possible that some matched_groups do not contain treatment units?

nehargupta commented 3 years ago

Hi @xlim7 thank you for raising this important issue. I hope to resolve it quickly for your use.

Is it possible for you to give me any additional information on your dataset? ie the number of rows/columns, or the maximum of each column? If it's possible for you to attach the dataset or a small sample itself, that would be even better.

The reason I ask is because I have seen this issue before, and it was previously caused by overflow within the calculation of matched groups. All groups should have at least one treated and at least one control unit, and the overflow error was causing this to fail. I would like to calibrate an error checking step for overflow accordingly. It's also possible that this is caused by something other than overflow, and I would like to eliminate that possibility.

If overflow is causing the error, I will adjust the error message accordingly (so it's more clear than division by zero) and will point you to a larger-scale database implementation of FLAME, which is nearly ready for release. If overflow is not the error, I will look for other possibilities.

Thank you for using FLAME in your work!

xlim7 commented 3 years ago

Thank you so much for the quick reply! I'll drop you an email with a sample dataset that reproduces the error if you don't mind. It has ~80K rows with ~37k treated, and 13 matching variables. Looking forward to the large-scale implementation of FLAME as well!

nehargupta commented 3 years ago

Update: This still needs a fix but my most recent push #34 contains a temporary flag for when the issue occurs, with a clean error message. Will definitely need to have a fix go out with the next version sometime before July though.

And, thanks for the file and bringing this up @xlim7, happy to keep talking via email.

Edit: had wrong tag