COSMOGRAIL / PyCS

Python Curve Shifting
http://www.cosmograil.org
GNU General Public License v3.0
4 stars 4 forks source link

Dependency of time-delay error estimations on inadequate binning of the true time-delay distribution #17

Open vbonvin opened 8 years ago

vbonvin commented 8 years ago

I've noted when playing around with the new covariance matrix function (#16) that the systematic and random errors of the pycs.sim.plot.measvstrue() function might quite strongly depend on the binning chosen for the true time-delay (truetds) distribution of the simulated light curves. It also depends on the plotting range chosen (the r parameter), since currently the extremas of the binning range are the set by (median of truetds) +- r.

I wonder if we should keep that dependency on the plotting range, since it's easy to screw-up by setting a r too small and thus possibly underestimate the uncertainty since not all the simulated light curves are considered. We could for example force that range to corresponds to the size of the truetds distribution.

Another possible source of errors is the number of bins (the nbins parameter). If the number is large, then the bins with smallest (and largest) truetds value that already contain less estimates might get biased, because they do not contain enough estimates to do robust statistic. The control plot (binned tderrs vs truetds) might help us see if this problem arises, but we could e.g. force a bin to have a minimum number of estimates for it to be considered.

@mtewes, what do you think ?

mtewes commented 8 years ago

I added "inadequate binning" to the issue title :) It's really a problem more related to the user, not any "flaw" in PyCS.

The idea behind these parameters (r, nbins) is that the user is aware of what she/he is doing. To pick values, one should first see the checkplots on wide ranges, see how the variance and bias depend on the true time delays, and then decide what range and binning make most sense (also depening on how much cpu cost should be invested). In that sense, if used "correctly", the values of r and nbins "do not matter". If moderately changing these parameter values within plausible ranges modifies the uncertainty estimate signficantly, something is wrong!

This error computation is meant for cases in which the dependence of a measurement uncertainty on the range of considered true delays has been discovered to be small. If light curves are short and of high quality, and this dependence gets very strong (because precious inflection points are in or out of the overlapping regions), deciding on the range to consider is really deciding about "is this feature real or not", something that PyCS can not do.

Bottomline, I'm a bit in favour of leaving these options "manual", and improving the doc / educating the user, instead of deriving some automatic settings.