Bayer-Group / pybalance

A library for minimizing the effects of confounding covariates
BSD 3-Clause "New" or "Revised" License
13 stars 0 forks source link

Add Assertion on the dataset while creating MatchingData #22

Open abhishek-ch opened 4 months ago

abhishek-ch commented 4 months ago

Add few data quality check inside MatchingData. ex: If the population column will struggle with boolean value like 0 and 1, it must catch them early

sprivite commented 4 months ago

Can you please show me the code that is causing trouble?

abhishek-ch commented 4 months ago

This code raised issue for me

match = matcher.get_best_match()
m_data = m.copy().get_population(0)

Assuming I have 0 and 1 in the population column

sprivite commented 4 months ago

I cannot reproduce:

from pybalance.utils.balance_calculators import * from pybalance.utils import MatchingData from pybalance.sim import load_paper_dataset

m =load_paper_dataset() data = m.data data.loc[data.population == 'pool', 'population'] = 0 data.loc[data.population == 'target', 'population'] = 1 m = MatchingData(data) m.copy().get_population(0)

sprivite commented 4 months ago

Can you please give the steps to reproduce?

abhishek-ch commented 4 months ago

large_confounding_adjustment_dataset.csv Here is the sample dataset

sprivite commented 4 months ago

What matcher are you using?

sprivite commented 4 months ago

Can you please paste the full code along with the error?