cosanlab / nltools

Python toolbox for analyzing imaging data
https://nltools.org
MIT License
122 stars 44 forks source link

Refactor ROC module #183

Open ljchang opened 6 years ago

ljchang commented 6 years ago

ROC plot has been having a lot of problems. Right now forced choice accuracy doesn't seem to be always correct.

We should refactor this and write proper tests.

Also need to address balanced accuracy p-value at some point (try permutations)

ljchang commented 6 years ago

forced choice test might be impacted by this commit 3bb8db388abc887b35504a37ff29daa9e33db8a7 by @ljchang

mpcoll commented 3 years ago

In case this is helpful for this, I noticed that the input type of the data silently gives different results for the same data (see example below). I think the input variables should be explicitly coerced into a specific type or raise an error if not of the expected type to avoid these issues.

I get different results for each of these examples:

from nltools.analysis import Roc
import numpy as np
import pandas as pd

inputs = np.array([1, 2, 1, 2, 2, 1, 1, 2])
outcomes = np.array([0, 1, 0, 1, 0, 1, 0, 1])
subs = np.array([1, 1, 2, 2, 3, 3, 4, 4])

# With int outcomes
roc = Roc(inputs, outcomes)
roc.calculate()
roc.summary()

# With numpy boolean outcomes
outcomes = outcomes.astype(bool)
roc = Roc(inputs, outcomes)
roc.calculate()
roc.summary()

# Forced choice
# With int inputs
roc = Roc(input_values=inputs,
          binary_outcome=outcomes,
          forced_choice=subs)
roc.calculate()
roc.summary()

# With float inputs
roc = Roc(input_values=inputs.astype(float),
          binary_outcome=outcomes,
          forced_choice=subs)
roc.calculate()
roc.summary()

# With pd Series outcomes
roc = Roc(input_values=inputs.astype(float),
          binary_outcome=pd.Series(outcomes.astype(bool)),
          forced_choice=subs)
roc.calculate()
roc.summary()
ljchang commented 3 years ago

Thanks for this. We are planning to do a major refactor to this module soon as it is a mess.