full vs aggregated data

Bug report/feedback

Module: Fitting

Describe what issue or problem you are experiencing on the application.

In which situation does biodosetools use the full data (i.e. every cell is one data point) or the aggregated data. My impression is, that the aggregated data is used if glm(...) doesn't ptoduce an error. Otherwise the software switches to the get_fit_maxlik_method(...) function and the full data is used. I think for both methods it should be consistent whether the aggregated or the full model is used. I noticed recently, that the full model can cause some substantial underestimation of the uncertainty if there are conditional dependencies between the observations. So, for the moment I would suggest that we stick to the aggregated data for the glm as well as for the maxLik case. This part needs some thorough thinking.

In addition, I think that weighting by 1/disp is only performed for the glm(...) but not for the get_fit_maxlik_method(...). Maybe this should be consistent, too? Do we actually need the weights?

A nice alternative for the rather complicated code of the constraint ML optimization in get_fit_maxlik_method(...) could be the package addreg which is designed for Poisson regressions with identity link. I also have the feeling that this package is more robust than our current implementation.

Please attach an image if it helps to visualize the problem.

biodosetools-team / biodosetools

full vs aggregated data #14

Bug report/feedback