treat k=0 modes carefully in window calculation

pmcdonal commented 2 years ago

I was wondering why the unit window normalization did not agree better with the sum over cells normalization - it seemed like they should agree to pretty high precision for the Y5 mock test I was doing. This for the traditional calculation with only the same size box as the power calculation. So this led me to notice that, when I set up kedges to isolate it, the k=0 mode appears to be set to zero minus shot noise, even though for window it should not at all be zero (the mean field is not zero). I couldn't figure out where/why this actually happens, but when I figured out how to hack in a reasonable value in the window power output object before plugging it into "to_real", and commented out this line which ignores k=0 anyway: https://github.com/cosmodesi/pypower/blob/a6d91712229b61b00710ec1a533f89ffbfa9a121/pypower/smooth_window.py#L382 it looks like this will fix the problem, i.e., the sum over cells normalization will agree with full window normalization to high precision.

So I think this should be fixed. The window is just wrong if k=0 is not treated properly, and this will propagate into a real error in fits. Here's where, Florian's trick of adding larger boxes can fix the problem (if done carefully - I will file another issue about that), by taking the k you use in the calculation closer to zero, but this may not be so necessary if the zero mode in the smaller boxes is actually used.

adematti commented 2 years ago

Good catch, thanks! Zero power at k=0 comes from MeshFFTPower zeroing k=0 modes for pk estimation, e.g. https://github.com/cosmodesi/pypower/blob/a6d91712229b61b00710ec1a533f89ffbfa9a121/pypower/fft_power.py#L1322. I think I'll just add the power back in CatalogSmoothWindow. Hope I find some time to do this soon.

pmcdonal commented 2 years ago

Shouldn't this just not be here? I.e., if the mean of the field is zero, it will be zero anyway, and if it is not, it is wrong...

adematti commented 2 years ago

Current convention is, if field is non zero, the selection function is assumed uniform, equal to the mean field (e.g., case where we are computing the power spectrum of cubic mock, providing only "data_positions"). (Happy to take any suggestion of convention that does what we want most of the time).

pmcdonal commented 2 years ago

Ok, but that will produce 0 Fourier mode value for k=0 mode, so no problem not setting it to zero by hand. I.e., I can't imagine a scenario where you want to set it to zero by hand... Ok, unless you don't actually subtract the mean before calculating the power in that case... but to me this is a special trick which, if you are going to use, you should be responsible for dealing with the (correctly non-zero) zero mode in the output... The current thing is a trap for general users...

adematti commented 2 years ago

Well, the general users expect that giving data positions only to CatalogFFTPower will compute Pk assuming uniform selection function, no? I can probably think about adding a flag to avoid zeroing k=0 mode, but it shouldn't be used that often.

pmcdonal commented 2 years ago

I'd just say, it is not a generally correct thing to do, so one way or another I would not hard-code it in... (i.e., I'd put it in only at a level where the context is clear enough to justify). This is the kind of thing that gets us in trouble, with things that should work appearing not to work...

pmcdonal commented 2 years ago

To put it another way: no one should ever want to actually use this 0 you are writing in for k=0, because it is trivial (e.g., for my power summary statistic I had set up kedges to avoid confusingly mixing this with other bins), so the only people for whom it should matter should want the non-trivial, non-zero, value... (e.g., for your uniform cube, people who would be mixing in this mode... just shouldn't do that (unless they have thought it through carefully enough to have a good reason)... so it should be fine if you just report the non-zero value)

adematti commented 2 years ago

Well, the issue is that other modes may fall together with k=0 in the kbin [0, whatever k>0]. Ok, so what about separately keeping track of the k=0 value in the end result (PowerSpectrumMultipoles and PowerSpectrumWedges) - just as shotnoise, and dub that "power_zeromode_nonorm", and propose options:

to not remove k=0 power
to not remove shotnoise (unrelated, but might be the opportunity to propose that) in e.g. https://github.com/cosmodesi/pypower/blob/a6d91712229b61b00710ec1a533f89ffbfa9a121/pypower/fft_power.py#L791 (same for PowerSpectrumWedges)

pmcdonal commented 2 years ago

I'm not sure I understand what you're saying, although at this point I guess you understand the issue well enough to do something that works out ok in practice. I guess I'd just re-iterate that, after digesting everything, I really think it should be fine to just delete those lines singling out k=0. Yes, sometimes people may set up kedges that mixes this with other modes, but those people probably don't want the 0 mixed in either, and should just fix this - it might be better for them if it was more obvious because a really crazy result came out. Now that we have run into the confusion, if you write this into the documentation (e.g., of kedges), problems should be mostly avoided.

Incidentally though, could k=0 not be excluded from a naive kmin=0 bin by defining all bins to include modes with k>kmin, k<=kmax... you must need to make a choice on this in any case, so a mode doesn't fall in 2 bins, right? I.e., no one setting a bin 0-x should assume the k=0 mode is in the bin - they should look it up/experiment (as I did), if they care - so if this kind of kedges does not report k=0 at all, most people will be happy, and people who want it, like for the window, can explicitly set a negative kmin to make sure to include it...

Bottom line: your goal of making almost everyone happy even if they haven't thought about any of this should be best achieved by excluding the k=0 mode from an input 0-x bin by definition, not by including it but setting it to zero...

adematti commented 2 years ago

Yes, currently we take left edges to be inclusive, right edges to be exclusive, but that can be reversed very easily. Even if P(k=0) = 0, changing the convention will change the value of the first bin (due to changing the number of modes), but maybe it is the actual best definition. I will think about it a bit more but I'm already fairly convinced.

pmcdonal commented 2 years ago

It seems clearly the best thing to do, since excluding k=0 is generally what people will want, with no argument for going the other way that I see. And delete this setting zero mode to zero so when one writes in negative kedges you pick up the correct thing. (anyone who writes in negative kedges and really wants k=0 set to zero... has clearly thought things through enough that they can figure out how to do it themselves)

adematti commented 2 years ago

In this case I should be careful about including Nyquist frequency, related to https://github.com/cosmodesi/pypower/issues/4. Any link to cpp code doing this correctly is welcome. In the short time scale maybe I can just remove k=0 from the first k-bin, otherwise keeping left edges to be inclusive, right edges to be exclusive.

adematti commented 2 years ago

I ended up treating separately P(k=0), as passing in negative edges would not be super-intuitive for an average user. At least with current implementation we have all information stored (while keeping backward-compatibility, hopefully). See 948546adc9fed1dc3ac510240c720657d7dcda0b. Not much time to test, though, so there may be some bugs left.

pmcdonal commented 2 years ago

I was happy to see that you had commented out the part that zeroed the zero mode before, so I could just ignore whatever else you did and propagate things naturally... but then it seems like the zero mode is still zeroed... Is this intentional? Where does it happen? E.g., a very natural binning seems to be something like: L=7000 N=256 dk=2*np.pi/L print(dk,np.pi/(L/N))

kedges = np.arange(-dk/2, 0.4+dk,dk)

(i.e., following the 1D grid, although of course not unambiguously best in 3D) If someone wants to use 0-x bins, I don't understand why you weren't happy with doing kmin<k<=kmax so it doesn't matter to them what you do with zero mode? (and then not zeroing it) Mixing in the zero mode seems like a trap for naive users whether you zero it or not...

pmcdonal commented 2 years ago

Ok... I found "get_power"... It seems like the window version of "remove_zero" should default to False. Also, "remove" is ambiguous, i.e., it might (arguably should) mean just treat this mode as not existing, rather than setting it to zero (the "zero" only telling you that you mean k=0). (this solution seems like a bunch of unnecessary bookkeeping... but if you're the one doing it, I guess I can't complain) As a general coding thing, it would be nice to have some way of indicating "you are not supposed to access this from outside" (like private in C++). If only by naming convention. Isn't there some convention like this with underscores?

pmcdonal commented 2 years ago

I guess I spoke too soon. When I do window1r=window1.to_real(sep=np.geomspace(1e-2, 1e4, 2**11)) now I just get a bunch of nans (was wondering if it was going to include zero mode).

With my simple "don't zero zero mode" hack, I was getting single-box window functions beautifully consistent with combined-box thing.

pmcdonal commented 2 years ago

"I just renamed remove_zero into null_zero_mode, setting to False by default for window function, and extrapolation values to zero (which would remove the "bunch of nan's" you are getting)." (not sure why this comment doesn't show up here for me)

I'm still getting nans. E.g., for your window2 in window_examples notebook. (if you want to say I shouldn't feed it a power measurement that has nans in it, I wouldn't hate that, it's just that before you could...)

adematti commented 2 years ago

I deleted my message because it figured out that nans were not removed. Now this should be fine, see https://github.com/cosmodesi/pypower/blob/main/nb/window_examples.ipynb, cell 6. Does this match what you obtained? Below my original comment.

I have just (c6e2d91) renamed remove_zero into null_zero_mode, setting to False by default for window function, and extrapolation values to zero (which would remove the "bunch of nan's" you are getting). I was not totally happy with kmin<k<=kmax, as the user would have needed to input negative k-edges for PowerSpectrumSmoothWindow, which would not be very intuitive. (I could have manually added a negative bin internally, but that is not something the user would expect). Anyway, my current suggestion does not prevent us from switching to kmin < k <= kmax at a later stage, as we have all the information recorded. Yes, in Python the convention is to start names of "private attributes" with an underscore. Where do you see an application of that? (I do not think we should tell the user not to access any of the current power spectrum / window function attributes?).

pmcdonal commented 2 years ago

The example of needing private I was thinking of was that, because zero mode is zeroed in underlying power storage, if I'm using that directly I will get wrong answers - if you force the user to access it through get_power they will see these options and understand how to get what they want. Anyway, to_real is working now.

cosmodesi / pypower

treat k=0 modes carefully in window calculation #7