binning.create_histogram_table side effects

cta-observatory / pyirf

Python IRF builder

https://pyirf.readthedocs.io/en/stable/

MIT License

15 stars 25 forks source link

binning.create_histogram_table side effects #259

Open kosack opened 1 year ago

kosack commented 1 year ago

binning.create_histogram_table "magically" uses a "weight" column if it finds it. This is undocumented and not at all obvious. It would be better to add another parameterweight_key="weight" and mention in the docs that the histogram will be weighted by the column specified byweight_key.

The docstring also doesn't mention that it outputs more than just the histogram, but both the normal and weighted version. And even more unexpected, if it finds a particle_type column, it makes grouped histograms by particle type that also end up in the output. It would be better (and more clear) to add this also as a parameter group_key="particle_type" to make it more obvious

kosack commented 1 year ago

Just to note this caused a large bug in my test script. I had use the column name "weights" instead of "weight", but create_histogram_table just silently created a "n_weighted" column with no weights in that case. It took me a while to find out why the weighted and un-weigted events were identical.

So additionally, if there is a weight_key option, it should fail if that column doesn't exist, or at the very least, not generate the n_weighted column

maxnoe commented 1 year ago

These are good suggestions, and I'll implement them.

This is undocumented and not at all obvious.

It is in the introduction:

https://pyirf.readthedocs.io/en/latest/introduction.html#dl2-event-lists

kosack commented 12 months ago

The intro does help (I had missed it!), but I still think hidden assumptions should be more explicit. I had the same issue with some of the other API functions. Having keywords for the column name helps a lot to understand what is happening underneath. The defaults can match the introduction, but allowing a user to use an API function without having to remap columns is also nice, especially as it allows for easier comparisons between reconstructors for example.

maxnoe commented 12 months ago

yes, agreed