dpc10ster / RJafroc

Artificial Intelligence: Evaluating AI, optimizing AI
19 stars 8 forks source link

Issues with `optim` when flipping groups #50

Closed datalorax closed 4 years ago

datalorax commented 4 years ago

Hi,

Thank you for a wonderful package. I'm having issues with the below, and I'm not sure if it's just me not fully understanding what the package is doing, but when I do the following everything works as expected:

library(RJafroc)

g1 <- c(2, 3, 5, 6)
g2 <- c(0, 4, 7, 11)

d <- Df2RJafrocDataset(g1, g2, InputIsCountsTable = TRUE)
FitBinormalRoc(d)
#> $a
#> [1] 0.4173326
#> 
#> $b
#> [1] 1.54108
#> 
#> $zetas
#>   zetaFwd1   zetaFwd2   zetaFwd3 
#> -1.2324185 -0.3781095  0.2846961 
#> 
#> $AUC
#> [1] 0.5898541
#> 
#> $StdAUC
#>           [,1]
#> [1,] 0.1112752
#> 
#> $NLLIni
#> [1] 45.46533
#> 
#> $NLLFin
#> [1] 43.89465
#> 
#> $ChisqrFitStats
#> $ChisqrFitStats$chisq
#> [1] NA
#> 
#> $ChisqrFitStats$pVal
#> [1] NA
#> 
#> $ChisqrFitStats$df
#> [1] NA
#> 
#> 
#> $covMat
#>                 a           b      zeta1      zeta2       zeta3
#> a      0.24601253 -0.03899557 0.08269791 0.11267253  0.12881384
#> b     -0.03899557  0.42737122 0.10931400 0.05024462 -0.07684782
#> zeta1  0.08269791  0.10931400 0.16832791 0.07548285  0.03444080
#> zeta2  0.11267253  0.05024462 0.07548285 0.09238670  0.06247600
#> zeta3  0.12881384 -0.07684782 0.03444080 0.06247600  0.09822176
#> 
#> $fittedPlot

But this is actually the exact opposite of what I want. When I try flipping the groups, however, I get an error with optim. I did a fair amount of digging into this and it seems to be coming from the internal ForwardValue function producing NaN, but can't fully understand why.

d2 <- Df2RJafrocDataset(g2, g1, InputIsCountsTable = TRUE)
FitBinormalRoc(d2)
#> Error in optim(par = c(aFwd = NaN, bFwd = 0.927966089318001, zetaFwd1 = -3.90053641225537, : non-finite value supplied by optim

I would have expected that I would just get the inverse AUC. For example

library(tidyverse)
cumm_prop <- data.frame(index = seq_along(g1), g1, g2) %>% 
  pivot_longer(cols = -index) %>% 
  group_by(name) %>% 
  mutate(cs = cumsum(value),
         prop = cs / sum(value)) %>% 
  select(index, name, prop) %>% 
  pivot_wider(names_from = name, values_from = prop)

ggplot(cumm_prop, aes(g1, g2)) +
  geom_line() +
  geom_abline(intercept = 0, slope = 1, color = "gray") +
  xlim(0, 1)


ggplot(cumm_prop, aes(g2, g1)) +
  geom_line() +
  geom_abline(intercept = 0, slope = 1, color = "gray") +
  ylim(0, 1)

Any help would be greatly appreciated. If you'd rather me post this issue on stack overflow or similar, I'd be happy to do that.

Created on 2020-04-14 by the reprex package (v0.3.0)

dpc10ster commented 4 years ago

Hi Daniel,

Thanks for your email. I believe the issue is the distinction between empirical plots (straight line segments connection adjacent operating points) and parametric fits (model based maximum likelihood fits to the operating points).

FitBinormalRoc() produces a parametric fit (the binormal model is used). It is constrained to have AUC greater than 0.5. It produces a smooth curve.

PlotEmpiricalOperatingCharacteristics() is the equivalent of the method you outline and will produce similar plots (see example below). It produces a piecewise linear fit ("jagged" fit).

There should not be an error when you interchange the arguments - this is due to a code bug (I suspect it has to do with the zero in the counts table) but this has nothing to do with the distinction between empirical and parametric plots.

Dev

[image: image.png]

On Tue, Apr 14, 2020 at 6:23 PM Daniel Anderson notifications@github.com wrote:

Hi,

Thank you for a wonderful package. I'm having issues with the below, and I'm not sure if it's just me not fully understanding what the package is doing, but when I do the following everything works as expected:

library(RJafroc) g1 <- c(2, 3, 5, 6)g2 <- c(0, 4, 7, 11) d <- Df2RJafrocDataset(g1, g2, InputIsCountsTable = TRUE) FitBinormalRoc(d)#> $a#> [1] 0.4173326#> #> $b#> [1] 1.54108#> #> $zetas#> zetaFwd1 zetaFwd2 zetaFwd3 #> -1.2324185 -0.3781095 0.2846961 #> #> $AUC#> [1] 0.5898541#> #> $StdAUC#> [,1]#> [1,] 0.1112752#> #> $NLLIni#> [1] 45.46533#> #> $NLLFin#> [1] 43.89465#> #> $ChisqrFitStats#> $ChisqrFitStats$chisq#> [1] NA#> #> $ChisqrFitStats$pVal#> [1] NA#> #> $ChisqrFitStats$df#> [1] NA#> #> #> $covMat#> a b zeta1 zeta2 zeta3#> a 0.24601253 -0.03899557 0.08269791 0.11267253 0.12881384#> b -0.03899557 0.42737122 0.10931400 0.05024462 -0.07684782#> zeta1 0.08269791 0.10931400 0.16832791 0.07548285 0.03444080#> zeta2 0.11267253 0.05024462 0.07548285 0.09238670 0.06247600#> zeta3 0.12881384 -0.07684782 0.03444080 0.06247600 0.09822176#> #> $fittedPlot

https://camo.githubusercontent.com/e59ff5bd9f670d6fbddaf128e5c159b65b5bc748/68747470733a2f2f692e696d6775722e636f6d2f5172467a3159422e706e67

But this is actually the exact opposite of what I want. When I try flipping the groups, however, I get an error with optim. I did a fair amount of digging into this and it seems to be coming from the internal ForwardValue function producing NaN, but can't fully understand why.

d2 <- Df2RJafrocDataset(g2, g1, InputIsCountsTable = TRUE) FitBinormalRoc(d2)#> Error in optim(par = c(aFwd = NaN, bFwd = 0.927966089318001, zetaFwd1 = -3.90053641225537, : non-finite value supplied by optim

I would have expected that I would just get the inverse AUC. For example

library(tidyverse)cumm_prop <- data.frame(index = seq_along(g1), g1, g2) %>% pivot_longer(cols = -index) %>% group_by(name) %>% mutate(cs = cumsum(value), prop = cs / sum(value)) %>% select(index, name, prop) %>% pivot_wider(names_from = name, values_from = prop)

ggplot(cumm_prop, aes(g1, g2)) + geom_line() + geom_abline(intercept = 0, slope = 1, color = "gray") + xlim(0, 1)

https://camo.githubusercontent.com/06252d582dae7ed04683dd75b32a76651d965456/68747470733a2f2f692e696d6775722e636f6d2f52436d647448782e706e67

ggplot(cumm_prop, aes(g2, g1)) + geom_line() + geom_abline(intercept = 0, slope = 1, color = "gray") + ylim(0, 1)

https://camo.githubusercontent.com/05d7ee64d643d125d90fbf765ac71e0f3dfa1846/68747470733a2f2f692e696d6775722e636f6d2f4532515069786e2e706e67

Any help would be greatly appreciated. If you'd rather me post this issue on stack overflow or similar, I'd be happy to do that.

Created on 2020-04-14 by the reprex package https://reprex.tidyverse.org (v0.3.0)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dpc10ster/RJafroc/issues/50, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH4NJRFE6YVYMLHNYJ4DEXDRMTO6NANCNFSM4MIDDMUA .

dpc10ster commented 4 years ago
datalorax commented 4 years ago

Thanks so much! In my application, I'm essentially trying to estimate the distance between two distributions when all I have is count data, so the curve can often go in both directions. For background, see here if you are interested.

Thanks for a quick response/resolution, and again for all your work on a great package!

dpc10ster commented 4 years ago

You are welcome; I think you need the parametric fits; I suggest using the d’ metric, which converts the (a,b) binormal parameters to a separation of two unit variance normals yielding the same AUC;

Btw, could you tell me how you posted your question, as this creates a nice thread? A procedure that I was not familiar with. I guess I am asking how to post an issue.

Dev

On Apr 15, 2020, at 12:43 PM, Daniel Anderson notifications@github.com wrote:

Thanks so much! In my application, I'm essentially trying to estimate the distance between two distributions when all I have is count data, so the curve can often go in both directions. For background, see here https://dash.harvard.edu/bitstream/handle/1/9276707/ho_reardon_jebs_final_revision.pdf?sequence=2 if you are interested.

Thanks for a quick response/resolution, and again for all your work on a great package!

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/dpc10ster/RJafroc/issues/50#issuecomment-614150202, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH4NJRCFFIMRNSPCDW3RDTTRMXP2HANCNFSM4MIDDMUA.

datalorax commented 4 years ago

Sure, I just clicked on Issues then New Issue. I posed the r code using three back ticks ``` then r, then close out the code with three back ticks. I also used the reprex package to create the example, which I think is really efficient and formats things nicely.

dpc10ster commented 4 years ago

Thanks you Daniel.

On Apr 15, 2020, at 2:38 PM, Daniel Anderson notifications@github.com wrote:

Sure, I just clicked on Issues then New Issue. I posed the r code using three back ticks ``` then r, then close out the code with three back ticks. I also used the reprex https://reprex.tidyverse.org/ package to create the example, which I think is really efficient and formats things nicely.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/dpc10ster/RJafroc/issues/50#issuecomment-614209533, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH4NJREQCQPOO2VBOOCLY5DRMX5LTANCNFSM4MIDDMUA.