Quantipy / quantipy

Python for people data
MIT License
66 stars 14 forks source link

Dataset.score() wanted #1285

Open alextanski opened 5 years ago

alextanski commented 5 years ago

We want to have a new method to create "index score"-like variables from a set of incoming scale-based question (arrays and normal singles potentially mixed). Signature might look like:

dataset.score(variables, factor_map, ignore=None, normalize=False, base='valid')

We can chat about the details later on.

alextanski commented 5 years ago

Basic idea:

dataset.add_meta('Q10_index', 'float', 'INDEX on Q10')

dataset['Q10_index'] = dataset[['Q10']].replace({1: 0, 2: 25, 3: 50, 4: 75, 5:100, 99: np.NaN}).mean(axis=1)

# Check
dataset[['Q10', 'Q10_index']]
alextanski commented 5 years ago

Details:

variables: a list of the relevant questions

factor_map: the values that the regular value codes should take on when building the mean. Note that there might be different factor_maps for different sub-variables of the variables list, so we might need to allow a dict that maps the factors by questions.

ignore: value codes that should not be inside the mean calculation, i.e. 99, 997, 999 etc.

normalize: Need to check again what that was

fill_na (this should be used instead base = 'valid'): insert a special (missing) value for empty entires, i.e. fill_na=888, so this can be excluded afterwards when aggregating... (don't ask why...).