cms-analysis / CombineHarvester

CMSSW package for the creation, editing and analysis of combine datacards and workspaces
cms-analysis.github.io/CombineHarvester/
15 stars 180 forks source link

Added goodness-of-fit plot range auto adjustment based on gaussian fit #271

Closed vicha-w closed 2 years ago

vicha-w commented 2 years ago

Hi,

In addition to a pull request I made several hours ago https://github.com/cms-analysis/CombineHarvester/pull/270, I have also added another option on top of that pull request to automatically adjust the goodness-of-fit histogram range based on gaussian fit of the whole distribution of test statistics. The plot range covers from -3 s.d. to +3 s.d. as shown in the plot below.

gof_plot_autogaus

I am not sure if creating a pull request with changes on top of another pull request is a good practice, so please let me know if this is okay. Also please let me know if you would like me to add any other changes.

Thanks, Vichayanun

adewit commented 2 years ago

For this one I'm not so sure, I worry that people might blindly use the option and miss strange outliers that perhaps should be investigated, or worse, blindly use the option and never see their observed test statistic value Also happy to take opinions from other developers :-)

vicha-w commented 2 years ago

Maybe we can issue a warning to the user that autogaus is currently used? I'm not sure if that would catch the user's attention. We can also ask for confirmation every time the user uses this option, but this would break automatic plotting...

Please let me know if you have any ideas.

ajgilbert commented 2 years ago

I can see why it's useful in some cases to have an automatic way to zoom in on the distribution. Instead of fitting Gaussians, another way might be to specify a quantile interval to centre on. In general I agree we should be careful with the plotting. Ideally the logic would be improved to ensure: by default the full range of the toys is shown (expanded to ensure the obs arrow is visible, if needed). If a restricted range is specified, we should display the underflow/underflow very clearly in the first and last bins (probably with a different colour and label on the plot), and if the arrow is then excluded, have some way to visualise this. Now I expect this will take some time to develop - one quick option for now is to merge the PR, but actually have it write some additional text on the top of the plot - perhaps "Restricted range - some toys are not shown", or similar?

vicha-w commented 2 years ago

I think I can add some changes to address these issues. Let me work on it and I'll commit them to this PR.

vicha-w commented 2 years ago

I just made some fixes to the script. Now I have commented out autogaus option and added percentile option instead. Users can add a --percentile option with two percentile numbers within the range of 0 and 1. Warning text will also be added when there are entries in either overflow or underflow bins, and if the arrow is not in the range of the histogram.

Here is an example plot where underflow/overflow warning appears: gof_plot_overflowtest

Another example with percentile option used. Here the option --percentile 0.01 0.99 is used. gof_plot_percentile

And here is what the warning text looks like if there are underflow/overflow entries and the arrow is not in the visible range. gof_plot_overflowtest_withpercentile

vicha-w commented 2 years ago

Hi,

It has been almost two months since my last update on this PR. Could you please review this?

Thanks!

vicha-w commented 2 years ago

Thanks @adewit. I have applied all the fixes you have suggested. Would you like me to fix anything else?

adewit commented 2 years ago

Looks good to me! @ajgilbert anything else from you?

ajgilbert commented 2 years ago

Nope, happy to merge