benmiroglio / pymatch

MIT License
272 stars 128 forks source link

Can I create age-matched control groups with this package? #37

Open skjerns opened 4 years ago

skjerns commented 4 years ago

I have a list of cases and a list of controls with their respective age.

Now I want to match one control to each case with a maximum age difference of 3. The goal is to get as many matches as possible, the age difference doesn't need to be optimized.

Is this possible with pymatch?

I've tried the following (just an example, not the real data):

from pymatch.Matcher import Matcher
import pandas as pd

cases_ages =[23, 21, 26, 25, 23, 44, 24, 22, 46, 26]
controls_ages = [34, 30, 24, 25, 25, 27, 30, 33, 53, 27, 26, 28, 23, 23, 28, 23, 24, 22, 23, 25]
cases_group = [1 for _ in range(len(cases_ages))]
controls_group = [0 for _ in range(len(cases_ages))]

df_cases = pd.DataFrame(list(zip(cases_ages, cases_group )), columns=['age', 'group'])
df_controls = pd.DataFrame(list(zip(controls_ages, controls_group )), columns=['age', 'group'])

m = Matcher(df_cases , df_controls , yvar='group')
m.fit_scores(balance=True, nmodels=100)
m.match(method='min', nmatches=1, threshold=0.0005)

print(m.matched_data)

I'd like to have a 1:1 mapping like that, with as many matches as possible, but without replacement (ie every case has exactly one control).

# match case id : control id
{0:2, 1:12, 2:3, 3:4, 4:4, ...}

However, pymatch matches with replacement (which is attempted to fix in #20 ), but even then matches are not optimized for number of matches