Closed vicha-w closed 2 years ago
For this one I'm not so sure, I worry that people might blindly use the option and miss strange outliers that perhaps should be investigated, or worse, blindly use the option and never see their observed test statistic value Also happy to take opinions from other developers :-)
Maybe we can issue a warning to the user that autogaus is currently used? I'm not sure if that would catch the user's attention. We can also ask for confirmation every time the user uses this option, but this would break automatic plotting...
Please let me know if you have any ideas.
I can see why it's useful in some cases to have an automatic way to zoom in on the distribution. Instead of fitting Gaussians, another way might be to specify a quantile interval to centre on. In general I agree we should be careful with the plotting. Ideally the logic would be improved to ensure: by default the full range of the toys is shown (expanded to ensure the obs arrow is visible, if needed). If a restricted range is specified, we should display the underflow/underflow very clearly in the first and last bins (probably with a different colour and label on the plot), and if the arrow is then excluded, have some way to visualise this. Now I expect this will take some time to develop - one quick option for now is to merge the PR, but actually have it write some additional text on the top of the plot - perhaps "Restricted range - some toys are not shown", or similar?
I think I can add some changes to address these issues. Let me work on it and I'll commit them to this PR.
I just made some fixes to the script. Now I have commented out autogaus option and added percentile option instead. Users can add a --percentile
option with two percentile numbers within the range of 0 and 1. Warning text will also be added when there are entries in either overflow or underflow bins, and if the arrow is not in the range of the histogram.
Here is an example plot where underflow/overflow warning appears:
Another example with percentile option used. Here the option --percentile 0.01 0.99
is used.
And here is what the warning text looks like if there are underflow/overflow entries and the arrow is not in the visible range.
Hi,
It has been almost two months since my last update on this PR. Could you please review this?
Thanks!
Thanks @adewit. I have applied all the fixes you have suggested. Would you like me to fix anything else?
Looks good to me! @ajgilbert anything else from you?
Nope, happy to merge
Hi,
In addition to a pull request I made several hours ago https://github.com/cms-analysis/CombineHarvester/pull/270, I have also added another option on top of that pull request to automatically adjust the goodness-of-fit histogram range based on gaussian fit of the whole distribution of test statistics. The plot range covers from -3 s.d. to +3 s.d. as shown in the plot below.
I am not sure if creating a pull request with changes on top of another pull request is a good practice, so please let me know if this is okay. Also please let me know if you would like me to add any other changes.
Thanks, Vichayanun