biodosetools-team / biodosetools

A tool to perform all different statistical tests and calculations needed by Biological Dosimetry Laboratories
https://biodosetools-team.github.io/biodosetools/
GNU General Public License v3.0
5 stars 2 forks source link

full vs aggregated data #14

Open dendesfelder opened 4 years ago

dendesfelder commented 4 years ago

Bug report/feedback

Describe what issue or problem you are experiencing on the application.

In which situation does biodosetools use the full data (i.e. every cell is one data point) or the aggregated data. My impression is, that the aggregated data is used if glm(...) doesn't ptoduce an error. Otherwise the software switches to the get_fit_maxlik_method(...) function and the full data is used. I think for both methods it should be consistent whether the aggregated or the full model is used. I noticed recently, that the full model can cause some substantial underestimation of the uncertainty if there are conditional dependencies between the observations. So, for the moment I would suggest that we stick to the aggregated data for the glm as well as for the maxLik case. This part needs some thorough thinking.

In addition, I think that weighting by 1/disp is only performed for the glm(...) but not for the get_fit_maxlik_method(...). Maybe this should be consistent, too? Do we actually need the weights?

A nice alternative for the rather complicated code of the constraint ML optimization in get_fit_maxlik_method(...) could be the package addreg which is designed for Poisson regressions with identity link. I also have the feeling that this package is more robust than our current implementation.

Please attach an image if it helps to visualize the problem.

jorgeegm commented 3 years ago

My opinion is that as a general approach aggregated data is safer and sensitive to detect glm.disp , the use of 1/disp can mask the effect of sampling a poisson distribution detected by the glm.disp , however this modification change past published coeficients , more effort is needed to test the effect of sampling to few cells at higher doses 3 , 4 , 5 Gy . In general 60 to 80 cells for 5 Gy. I think that we need more influence of theses doses 500 to 300 cells scored to obtain a mean to get more influence on the beta parameter reducing the sampling error and varations between labs.