HERA-Team / hera_pspec

HERA power spectrum estimation code and data formats
http://hera-pspec.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
4 stars 3 forks source link

What array dimensions should `PSpec` support? #28

Closed philbull closed 6 years ago

philbull commented 6 years ago

PSpec is a new container object for power spectrum data. It will contain certain methods to ensure consistency of the data (e.g. keeping track of units), and support for basic operations like converting units, reducing the dimensionality of the power spectra (e.g. from 2D -> 1D), and possibly performing actions on other dimensions (like averaging over bootstrap samples or summing LSTs). The purpose of this issue is to suggest a possible basic structure, and to ask for examples of possible use-cases. How do you want to be able to use a power spectrum container class? What dimensions should/shouldn't it support? What are logical groups of different types of power spectra (e.g. different bootstrap samples, different polarisations)?

philbull commented 6 years ago

Suggested internal structure for PSpec data

Each PSpec will store the power spectrum data in a numpy array with 3+d dimensions:

  1. First 1-3 dimensions: k-bins in d=1,2,3 dimensions.
  2. Next dimension: LSTs / LST bins
  3. Next dimension: Polarisations
  4. Next dimension: Redshift bins / frequency bands (added in edit)
  5. Last dimension: Bootstrap samples

Examples of extra dimensions that some users might need:

Our current thinking is that the extra dimensions should not be supported by PSpec. Instead, one would use Python lists or a new 'container of PSpec' object to group multiple PSpec objects together. This leaves an open question of whether we should allow multiple PSpec objects to be serialised into a single file, or whether we should use directories, HDF5 or some other grouping to store multiple PSpec objects to disk.

dannyjacobs commented 6 years ago

One key feature the completed pipe needs to have is the concept of implicit mulitchannel processing. By channels I mean anything that might result in a different output. Examples include different covariance calculations, identity, different versions of input data (noise, data, simulation etc), different level of inject, etc. Having all of these kinds of consistency checks output must be output as a matter of routine, ei must not require effort on the part of the user, happen automatically.

My suggested implementation is to define a simple pspec container (as you suggest) which is the input to all top level functions. Never the individual pspecs.

philbull commented 6 years ago

(Oops, I forgot to mention redshift bins. Original comment edited.)

philbull commented 6 years ago

@dannyjacobs Great, thanks for this. So in the structure suggested above, these groupings would be external to PSpec, but might require their own container classes to keep everything consistent? Do you envisage a single generic PSpecContainer container class to do all of this, or different container classes for different use cases? (They could be subclassed off a generic container class I guess.)

miguelfmorales commented 6 years ago

I think the k dimensions need to be clearly defined. A few questions/options that immediately come to mind:

For 3D k (assuming this is for delay PS only):

While the date issues don't make a big difference, defining which is which really helps if we ever want to cross-correlate with something else, and for hunting effects later (e.g. "the galaxy is almost due east at this LST, so do I see the pitchfork in that k_perp?" is much more easily answered if we don't have to hunt down whether east corresponds to d=1 or d=2).

miguelfmorales commented 6 years ago

Is this only for 3D k, or is it meant to be used hold 2D and 1D PS too?

dannyjacobs commented 6 years ago

@philbull The most important idea is that the individual pspec class not be commonly used by most pipeline users. Instead you would use a PspecCollection and all members would be processed the same way. If you are experimenting with a variation on a method or comparing jacknives, you might add a new pspec to the collection. But the collection should be baked in from the beginning.

philbull commented 6 years ago

@dannyjacobs Yep, that's reasonable. People can always create trivial 1-element collections if they need to.

dannyjacobs commented 6 years ago

@miguelfmorales He mentions 3 k dimensions but does not specify. @philbull is the idea that the dimension units could be either delay + baseline or eta,kperp u and v?

philbull commented 6 years ago

@dannyjacobs Yes, the idea was to use PSpec as a container for multiple different types/conventions/dimensionalities of power spectrum, and to include a few convenience functions to convert between "popular" choices for each of those. So we'd support delay+baseline and eta, k_u,v for example.

philbull commented 6 years ago

This issue is now obsolete with the arrival of UVPSpec and PSpecContainer (see PR #61, which has been merged, and PR #57, which hasn't yet).