epiverse-trace / quickfit

[SUSPENDED] Toolbox of model fitting helper functions
https://epiverse-trace.github.io/quickfit/
Other
2 stars 1 forks source link

Functionality vs. existing implementations #2

Open sbfnk opened 1 year ago

sbfnk commented 1 year ago

Some of the functionality as outlined in the README of the init_setup branch already exists in some form elsewhere - it might be worth adding a note for the specific use case the package aiming to address.

library("stats4")

sim_data <- rnorm(50, 4, 2)

log_l <- function(a,b) -sum(dnorm(sim_data, a, b, log = TRUE))

x <- stats4::mle(log_l, start = list(a = 3, b = 1))
#> Warning in dnorm(sim_data, a, b, log = TRUE): NaNs produced
confint(x)
#> Profiling...
#>      2.5 %   97.5 %
#> a 3.131020 4.365399
#> b 1.817015 2.693398

library("gamlss")
#> Loading required package: splines
#> Loading required package: gamlss.data
#> 
#> Attaching package: 'gamlss.data'
#> The following object is masked from 'package:datasets':
#> 
#>     sleep
#> Loading required package: gamlss.dist
#> Loading required package: nlme
#> Loading required package: parallel
#>  **********   GAMLSS Version 5.4-20  **********
#> For more on GAMLSS look at https://www.gamlss.com/
#> Type gamlssNews() to see new features/changes/bug fixes.

y <- gamlss::fitDist(
 y = rlnorm(n = 100, meanlog = 1, sdlog = 1),
 k = 2, type = "realplus", trace = FALSE
)
#>   |                                                                              |                                                                      |   0%  |                                                                              |===                                                                   |   4%  |                                                                              |======                                                                |   9%  |                                                                              |=========                                                             |  13%  |                                                                              |============                                                          |  17%  |                                                                              |===============                                                       |  22%  |                                                                              |==================                                                    |  26%  |                                                                              |=====================                                                 |  30%  |                                                                              |========================                                              |  35%  |                                                                              |===========================                                           |  39%  |                                                                              |==============================                                        |  43%  |                                                                              |=================================                                     |  48%  |                                                                              |=====================================                                 |  52%  |                                                                              |========================================                              |  57%  |                                                                              |===========================================                           |  61%  |                                                                              |==============================================                        |  65%  |                                                                              |=================================================                     |  70%  |                                                                              |====================================================                  |  74%  |                                                                              |=======================================================               |  78%  |                                                                              |==========================================================            |  83%  |                                                                              |=============================================================         |  87%  |                                                                              |================================================================      |  91%  |                                                                              |===================================================================   |  96%  |                                                                              |======================================================================| 100%
confint(y)
#>            2.5 %    97.5 %
#> mu     0.9388532 1.3418009
#> sigma -0.1110271 0.1661535

Created on 2023-11-20 with reprex v2.0.2

sbfnk commented 9 months ago

As pointed out by @pratikunterwegs in https://github.com/epiverse-trace/cfr/issues/107#issuecomment-1818517559 the first task also be performed using base R:

sim_data <- rnorm(50, 4, 2)

log_l <- function(par) -sum(dnorm(sim_data, par[1], par[2], log = TRUE))

x <- optim(c(3, 1), log_l, hessian = TRUE)
x$par
#> [1] 3.998069 1.957719

prop_sigma <- sqrt(diag(solve(x$hessian)))
ci <- matrix(x$par, nrow = 2, ncol = 2, byrow = TRUE) +
  1.96 * c(1, -1) %*% t(prop_sigma)
ci
#>          [,1]     [,2]
#> [1,] 4.540720 2.341556
#> [2,] 3.455417 1.573881

Created on 2023-11-20 with reprex v2.0.2

adamkucharski commented 8 months ago

Thanks. Although does that assume likelihood surface is normally distributed (or can be approximated by one), i.e. the 1.96 * c(1, -1) line. Seems like would be useful to have ability to generate profile (i.e. by finding points 1.92 from MLE by refitting across the range of values of parameter of interest) rather than relying on marginals or normal approximations?

Actually this distinction might be a useful point to make in a vignette somewhere (e.g. epichains, where R and k may be correlated, and hence marginal would give overconfident estimates)