benmiroglio / pymatch

MIT License
272 stars 128 forks source link

Adding option to match without replacement. #20

Open tlooden opened 5 years ago

tlooden commented 5 years ago

Hi Ben,

Thanks for making this nice tool! If you like, i've implemented a new feature that is common in eg. R packages for the same purpose which is to have the option to match without replacement. The downsides to this can be slightly worse matching overall as well as possible order effects - however for some types of analyses you really want to have unique subjects in each group. Now the user has the choice to make that decision! :)

I've also implemented (line 189) a randomization for the order in which the matching proceeds. This is so that you can check for said ordering effects, and e.g. run it a couple of times until the matching is at a desirable level.

Please let me know if i can make anything more clear. it's my first GH pull request so i hope i am following the right protocol.

All the best!

Tristan

skjerns commented 4 years ago

same as #13 , this somehow does not work correctly, or I did something wrong?

from pymatch.Matcher import Matcher
import pandas as pd

cases_ages =[23, 21, 26, 25, 23, 44, 24, 22, 46, 26]
controls_ages = [34, 30, 24, 25, 25, 27, 30, 33, 53, 27, 26, 28, 23, 23, 28, 23, 24, 22, 23, 25]
cases_group = [1 for _ in range(len(cases_ages))]
controls_group = [0 for _ in range(len(cases_ages))]

df_cases = pd.DataFrame(list(zip(cases_ages, cases_group )), columns=['age', 'group'])
df_controls = pd.DataFrame(list(zip(controls_ages, controls_group )), columns=['age', 'group'])

m = Matcher(df_cases , df_controls , yvar='group')
m.fit_scores(balance=True, nmodels=100)
m.match(method='min', nmatches=1, with_replacement=False)
print(m.matched_data)
# only 4 matches are found?
harveyaa commented 2 years ago

@tlooden Thank you for this feature :)