Open ljchang opened 6 years ago
forced choice test might be impacted by this commit 3bb8db388abc887b35504a37ff29daa9e33db8a7 by @ljchang
In case this is helpful for this, I noticed that the input type of the data silently gives different results for the same data (see example below). I think the input variables should be explicitly coerced into a specific type or raise an error if not of the expected type to avoid these issues.
I get different results for each of these examples:
from nltools.analysis import Roc
import numpy as np
import pandas as pd
inputs = np.array([1, 2, 1, 2, 2, 1, 1, 2])
outcomes = np.array([0, 1, 0, 1, 0, 1, 0, 1])
subs = np.array([1, 1, 2, 2, 3, 3, 4, 4])
# With int outcomes
roc = Roc(inputs, outcomes)
roc.calculate()
roc.summary()
# With numpy boolean outcomes
outcomes = outcomes.astype(bool)
roc = Roc(inputs, outcomes)
roc.calculate()
roc.summary()
# Forced choice
# With int inputs
roc = Roc(input_values=inputs,
binary_outcome=outcomes,
forced_choice=subs)
roc.calculate()
roc.summary()
# With float inputs
roc = Roc(input_values=inputs.astype(float),
binary_outcome=outcomes,
forced_choice=subs)
roc.calculate()
roc.summary()
# With pd Series outcomes
roc = Roc(input_values=inputs.astype(float),
binary_outcome=pd.Series(outcomes.astype(bool)),
forced_choice=subs)
roc.calculate()
roc.summary()
Thanks for this. We are planning to do a major refactor to this module soon as it is a mess.
ROC plot has been having a lot of problems. Right now forced choice accuracy doesn't seem to be always correct.
We should refactor this and write proper tests.
Also need to address balanced accuracy p-value at some point (try permutations)