Open tlooden opened 5 years ago
same as #13 , this somehow does not work correctly, or I did something wrong?
from pymatch.Matcher import Matcher
import pandas as pd
cases_ages =[23, 21, 26, 25, 23, 44, 24, 22, 46, 26]
controls_ages = [34, 30, 24, 25, 25, 27, 30, 33, 53, 27, 26, 28, 23, 23, 28, 23, 24, 22, 23, 25]
cases_group = [1 for _ in range(len(cases_ages))]
controls_group = [0 for _ in range(len(cases_ages))]
df_cases = pd.DataFrame(list(zip(cases_ages, cases_group )), columns=['age', 'group'])
df_controls = pd.DataFrame(list(zip(controls_ages, controls_group )), columns=['age', 'group'])
m = Matcher(df_cases , df_controls , yvar='group')
m.fit_scores(balance=True, nmodels=100)
m.match(method='min', nmatches=1, with_replacement=False)
print(m.matched_data)
# only 4 matches are found?
@tlooden Thank you for this feature :)
Hi Ben,
Thanks for making this nice tool! If you like, i've implemented a new feature that is common in eg. R packages for the same purpose which is to have the option to match without replacement. The downsides to this can be slightly worse matching overall as well as possible order effects - however for some types of analyses you really want to have unique subjects in each group. Now the user has the choice to make that decision! :)
I've also implemented (line 189) a randomization for the order in which the matching proceeds. This is so that you can check for said ordering effects, and e.g. run it a couple of times until the matching is at a desirable level.
Please let me know if i can make anything more clear. it's my first GH pull request so i hope i am following the right protocol.
All the best!
Tristan