benmiroglio / pymatch

MIT License
276 stars 129 forks source link

Assessing balance with only select columns #39

Open vispz opened 4 years ago

vispz commented 4 years ago

Context

Sometimes the is_continuous is not precise enough to identify continuous and categorical variables, leading to unwieldy plots. Also sometimes it's useful to only plot the balance on a few specific covariates of interest after matching.

Changes

In order to enable that, I have added a new argument columns: List[str] to the Matcher.compare_continuous and Matcher.compare_categorical methods. When columns is passed in, we do not verify if the column is continuous or categorical but we do remove columns in the self.exclude set.

There is no actual change within the plotting for loops (as indicated below). The changes are simply due to the removal of indentation.

# from
        for col in self.matched_data.columns:
            if uf.is_continuous(col, self.X) and col not in self.exclude:
                do_stuff()

# to
        for col in columns_to_plot:
            do_stuff()

Verification

I ran the example notebook with and without columns and they work as intended. See nbviewer.