almost-matching-exactly / DAME-FLAME-Python-Package

A Python Package providing two algorithms, DAME and FLAME, for fast and interpretable treatment-control matches of categorical data
https://almost-matching-exactly.github.io/DAME-FLAME-Python-Package/
MIT License
56 stars 14 forks source link

Off by 1 (or two?) error in `model.pe_each_iter`? #47

Closed nickeubank closed 1 year ago

nickeubank commented 2 years ago

Also running into some indexing confusion with model.pe_each_iter among students -- since iteration 1 has a pe error of 0 (all exact matches, right?) it doesn't get included in pe_each_iter, which means if you index into that, it's off by 1 (or two, given seems DAME-FLAME counting starts at 1, not 0). Probably need to adopt a consistent approach to these.

nickeubank commented 2 years ago

@marlhakizi @nansuwang

nehargupta commented 1 year ago

Hi @nickeubank, @haoningjiang is working on this and had an interesting point -- Having a PE of 0 for the first iteration (which we'll move to 0 indexing and call it the 0th iteration by the way) makes sense when there's units matched. But, when there's no perfectly matched units in the dataset, potentially having a null PE is also an option in that specific case. Please let us know if you have a preference! we'll go with 0 for the first iteration for all cases by default for now if we don't hear from you

nickeubank commented 1 year ago

If no units are matched, nan / None makes sense to me.

vittorioorlandi commented 1 year ago

As of now, an iteration refers to one round of PE computation and matching (independent of whether matches are actually made on the relevant covariate set). There will be as many entries in .pe_each_iter and .bf_each_iter as there are iterations.

The PE is the error associated with using a covariate set to predict the outcome and is thus always defined, regardless of whether matches are made. As an example, setting early_stop_iterations = 0 and want_pe = True means that one round (the 0'th) of matching will be completed (exact matching) and the .pe_each_iter attribute will be a list of length 1, containing the PE associated with using all covariates to estimate the outcome.