Standardise storing of binning information

LSSTDESC / sacc

Save All Correlations and Covariances

BSD 3-Clause "New" or "Revised" License

15 stars 8 forks source link

Standardise storing of binning information #72

Open tilmantroester opened 2 years ago

tilmantroester commented 2 years ago

As far as I can tell, there is no standard way to store (ell/theta) binning information. Methods to store and retrieve bin edges and optionally weights would be useful I think. The idea is to store all relevant information to recreate the (bandpower) window functions. Storing just ell_min, ell_max, n_ell, and spacing isn't general enough (we might want a linear-log binning at some point) and error prone (every code needs to reimplement creating bin edges and weights).

The main application I have in mind here is a case where the actual window function is not available (e.g., for real-space so far) or is not the quantity of interest (e.g., plotting the bins).

damonge commented 2 years ago

sorry, maybe I don't understand what the concern is. There are currently 4 different window functions implemented, for top-hat (linear and log), and various tabulated options: https://github.com/LSSTDESC/sacc/blob/master/sacc/windows.py. What else would you need?

tilmantroester commented 2 years ago

If I understand the docs correctly, the window functions are attached to each data point and there can only be one window function per data point. So if there's already a window function (e.g., from namaster), then there can't be another to store the original bin specification. There's also no easy way to get the binning information per data_type, rather than per data point (since the binning information would be the same for all tracers).

Maybe adding this functionality as methods of the Sacc class is overkill but having some standardised way to infer what the original binning specification was without having to go through the code that created a specific sacc file would be useful I think.

damonge commented 2 years ago

Accessing the global binning scheme for a set of power spectra (rather than each individually) is something I needed at the start. Can you check cell 12 of https://github.com/LSSTDESC/sacc/blob/master/examples/CMB_LSS_read.ipynb? This is how one can use bandpower window functions for power spectra. Are you thinking about something like this?

damonge commented 2 years ago

Having more than one window function per bin sounds kind of dangerous though. I'd rather make sure that whatever window function you associate with it has all of the information you want in one go.

tilmantroester commented 2 years ago

I'm thinking of something like recovering the information from cell 6 in https://github.com/LSSTDESC/sacc/blob/master/examples/CMB_LSS_write.ipynb. When using namaster, the window function that's stored isn't going to be a clean tophat one but one that's been convolved with the mode-mixing term. My question is how the original binning specification can be stored efficiently (i.e., not having a massive array for each data point, which is already bloating the sacc files quite a bit) and easily accessible.

damonge commented 2 years ago

the way I see it we could either: a) Add the possibility of storing the original bin edges in the BandpowerWindowFunction objects. This would only solve this for this particular form of window. b) Do the above at the base class level, so all windows always store some notion of edge. c) For your particular case, if you don't care about the details of the window function, use the top-hat ones.

tilmantroester commented 2 years ago

Options a) or b) make sense to me. But this should never replace the full bandpower window functions. Maybe a function in utils that consistently stores and retrieves bin edges from metadata would be enough though. What prompted this issue is that TXPipe puts some binning information in metadata but TJPCov reads some binning information from somewhere else (that TJPCov shouldn't need this information is a different issue). Having a somewhat standardised interface that people can use would address this to some degree.