Open alextanski opened 5 years ago
Basic idea:
dataset.add_meta('Q10_index', 'float', 'INDEX on Q10')
dataset['Q10_index'] = dataset[['Q10']].replace({1: 0, 2: 25, 3: 50, 4: 75, 5:100, 99: np.NaN}).mean(axis=1)
# Check
dataset[['Q10', 'Q10_index']]
Details:
variables
: a list of the relevant questions
factor_map
: the values that the regular value codes should take on when building the mean. Note that there might be different factor_maps for different sub-variables of the variables list, so we might need to allow a dict
that maps the factors by questions.
ignore
: value codes that should not be inside the mean calculation, i.e. 99, 997, 999 etc.
normalize
: Need to check again what that was
fill_na
(this should be used instead base = 'valid'
): insert a special (missing) value for empty entires, i.e. fill_na=888
, so this can be excluded afterwards when aggregating... (don't ask why...).
We want to have a new method to create "index score"-like variables from a set of incoming scale-based question (arrays and normal singles potentially mixed). Signature might look like:
dataset.score(variables, factor_map, ignore=None, normalize=False, base='valid')
We can chat about the details later on.