Further development / testing of lensing engine

rmandelb commented 12 years ago

As in the pull request for the initial version of the "lensing engine" for generating non-constant shear fields, there are some additional improvements and testing that are necessary in the next few months. Below is a list of those that are collected from the pull request, where those near the top of the list are more important than those at the bottom.

More thorough testing of the power spectrum functionality overall, using a variety of completely independent codes that are available to us. Right now we have done enough testing to know that the output shear fields have roughly the right scaling with k, for example, and obey a consistent normalization convention. But there's more to do, and hints of some possible weirdness in the pull request in certain cases (which do have a plausible explanation, so more testing is necessary to understand this).
Right now, the lensing engine includes a few simple power law power spectra, or the user can specify their own that takes a single argument, k. In future, I can think of a few things we will want: (a) to have a cosmological shear power spectrum for some cosmology and dN/dz; (b) to allow the power spectrum function to take additional kwargs; (c) to write a function that reads in a power spectrum tabulated at some k values from a file, and automatically interpolates it to get P(k) for arbitrary k.
Shears at non-gridded positions. Per Mike's request, I will make sure it's possible to do this one position at a time from a saved shear field (in addition to doing it by supplying all positions at once).
I'd like to eliminate the scipy dependence of NFWHalo if it's not too painful to do so.
Turn the utilities.py method eval_sbinterpolatedimage (which returns the value of an image at a list of positions) into C++, for speed.
We should have further discussion on just how much cosmology code we want to include with GalSim. There is also this issue of the mass specification in M_solar/h vs. including h in the Cosmology class and then requiring specification of mass in M_solar.
We have to decide if the normalization convention I've chosen for the power spectra, as in the dox, makes sense. It's one of only several possible conventions, and we could eventually decide to support several conventions if there's demand for it (i.e., choosing via keywords).
I want the shear fields from a power spectrum to also include the possibility of getting the convergence.
Unit handling: we require positions in arcsec, but we could imagine making some class that can cleanly deal with different ways of specifying positions (just like we already do for angles).

Please let me know if I've missed something or if you disagree about the relative importance of these.

rmjarvis commented 12 years ago

In #243, I partially addressed the "Shears at non-gridded positions" item.

In the PowerSpectrum getShear function, I implemented the ability to pass in arbitrary positions and get back the corresponding (g1,g2), so long as the grid parameters are either also provided, or have previously been provided on a prior call to getShear. This (more or less) matches the documentation that Rachel had written up.

However, I think this function should be split into two. One function (maybe buildGrid) that builds the grid and returns numpy arrays of g1,g2. And another (probably keeping the name getShear) that can use this grid to get a shear at arbitrary positions. I left the job of doing this split for this issue.

I also made one baby step toward dealing with the units issue. Rather than a string units = 'arcsec', I changed units be an AngleUnit. So the default parameter is units = galsim.arcsec. I'm envisioning that we would declare the AngleUnit for (k^-1) and for the positions. So invk_units and pos_units maybe. Or maybe just units on the constructor (with E_power_function and B_power_function) would refer to the units of k^-1, and then a units parameter passed to buildGrid and getShear would refer to the units of the positions. Then the conversion would be very straightforward. But I didn't add any of that functionality to the code. Just the kernel of an idea. :)

rmandelb commented 11 years ago

Three months later, I'm finally going to work on this issue. Some of the items that are listed here have become obsolete, so here's a summary of the plans:

1) We still need the more thorough testing of the power spectrum functionality overall, including the funny variance behavior from the pull request on the initial version of the lensing engine. My plan is to first address that variance issue by using Chris Hirata's shear simulation code to see if it gives the same behavior. Then, I will move to more general testing, again by comparison with that code. He gave us permission to use his code for testing GalSim, so that seems like a good way to go.

2) Other shear power spectra: mostly this will be taken care of in #305. However, I would like to do what is listed as (b) above (allow the power spectrum function to take some additional keyword arguments). Unless I'm completely off base this should be pretty easy.

3) Shears at non-gridded positions: I will take Mike's suggestion to split up the operations, to allow the interpolation to single positions, and also do the step of taking a list of non-gridded positions to start with, then making an intelligent choice of grid scale (rather than requiring calculation on a grid before interpolating). Also, I will do a careful test of the outputs, by comparison with the gridded case and also with Chris's code, which also handles non-gridded positions.

4) The eval_sbinterpolatedimage translation to C++: not sure about this. Might be tricky? We need to interpret the numpy arrays as vectors, I think, which relates to the functionality that Mike is going to put in #305 for interpolation of Tables. So I might wait to see what that looks like...

5) Shear normalization convention: I'm going to see how the comparison with Chris's code goes, and also compare with the GREAT10 power spectrum code. Once I see what the differences are then I can see about allowing for different conventions.

6) Units: I like Mike's idea in the comments of this issue, so I would like to implement this.

The items from the original list that I'm omitting are: removing scipy dependence (Mike already did this); strategic decisions about the amount of cosmology calculations to include in GalSim (because I think we definitely don't need more right now!).

Note that (1), (2), and (5) are necessary for the challenge; (3) and (6) are not, but are nice functionality that we agreed we'd like anyway; (4) might be needed for the challenge, depending on whether the slowness of eval_sbinterpolatedimage is a limiting step in the production of simulations. These factors will establish the priority for the various bits of work in this issue.

Comments are welcome; @barnabytprowe , @rmjarvis , others?

rmandelb commented 11 years ago

@rmjarvis - I was going over some things in lensing.py, and I was wondering about the 2nd usage example you gave in the documentation for getShear. Is there a line missing or something? You initialize an image but then don't use it...

rmjarvis commented 11 years ago

I was wondering about the 2nd usage example you gave in the documentation for getShear. Is there a line missing or something? You initialize an image but then don't use it...

Yeah, I'm not sure what I was thinking for that line. I think just the g1,g2 line is all that should be in that example.

rmandelb commented 11 years ago

That doesn't actually seem to work:

/Users/rmandelb/great3/GalSim/<ipython-input-3-b7b3e4cf02f9> in <module>()
----> 1 g1, g2 = my_ps.getShear(grid_spacing = 8., grid_nx = 65., rng = galsim.BaseDeviate(1413231), center = (256.5, 256.5) )

/Users/rmandelb/great3/GalSim/galsim/lensing.py in getShear(self, pos, grid_spacing, grid_nx, rng, interpolant, center)
    329         # Build the grid if requested.

    330         if grid_spacing is not None:
--> 331             psr = PowerSpectrumRealizer(grid_nx, grid_nx, grid_spacing, self.p_E, self.p_B)
    332             self.grid_g1, self.grid_g2 = psr(gd)
    333 

/Users/rmandelb/great3/GalSim/galsim/lensing.py in __init__(self, nx, ny, pixel_size, e_power_function, b_power_function)
    404     """
    405     def __init__(self, nx, ny, pixel_size, e_power_function, b_power_function):
--> 406         self.set_size(nx, ny, pixel_size, False)
    407         self.set_power(e_power_function, b_power_function)
    408 

/Users/rmandelb/great3/GalSim/galsim/lensing.py in set_size(self, nx, ny, pixel_size, remake_power)
    430 
    431         #Compute the spin weightings

--> 432         self._cos, self._sin = self._generate_spin_weightings()
    433 
    434         #Optionally (because this may be the first time we run this, or we may be about to change

/Users/rmandelb/great3/GalSim/galsim/lensing.py in _generate_spin_weightings(self)
    525         ky = self.ky
    526         TwoPsi=2*np.arctan2(1.0*self.ky, 1.0*self.kx)
--> 527         C[kx,ky]=np.cos(TwoPsi)
    528         S[kx,ky]=np.sin(TwoPsi)
    529         C[-kx,ky]=-np.cos(TwoPsi)

IndexError: arrays used as indices must be of integer (or boolean) type

I'll try to look into this later / tomorrow.

Happy new year. :)

rmandelb commented 11 years ago

... never mind, I figured it out: the only issue is that the ngrid is a float (65.) not an int (65). I fixed the dox on master to remove the random Image initialization and to make the ngrid in that example into an int, so that it can be properly run. In my work on this branch I might change the handling of ngrid so that it automatically catches floats, converts to int, and raises a warning.

By the way, it turns out that I was right, it's easy to change the PowerSpectrum initialization to use *args, **kwargs and catch any additional kwargs to be passed to the power spectrum function, so that it can be a function of other things besides k. However, the tricky part that I realized was that we actually have two power spectrum functions, one for E and one for B, but this args/kwargs trick only works if only one power function is specified (or if the same power function is used for both). So I think that instead I might need to go back to the current treatment, but with two optional keywords like e_kwargs and b_kwargs which are strings that get processed using eval. Still thinking this over.

rmandelb commented 11 years ago

And here's a slightly disappointing update about the variance issue:

To copy a description of the problem from the pull request for the original version of the lensing engine, (PR #242 for issue #216): if I make a flat power spectrum P_E = 0.01 for our ell range, and P_B=0, then I get var(g1)=0.0043 and var(g2)=0.0057. If I make P_E=0 and P_B=0.01 then these numbers reverse. So it is the case that var(g1)+var(g2)=0.01, which is what we expect, but I don't understand the relationship between var(g1) and var(g2) in these cases. Mike guessed it might have something to do with a square grid looking different in the g1 vs. g2 directions, square vs. diamond, but we never resolved this.

My first task on here was to check whether Chris Hirata's code for generating random shears from a power spectrum behaves the same way, and the answer is no. For his code (same grid of positions, same PS as here), the shear variances are equal for both components.

Also I noticed that on #304, Joe modified the part of the lensing engine that generates shears (so, some of our unit tests now fail on that branch - will have to update them). I checked and found that the lensing engine has the same behavior as I described here for the lensing engine on master, so it's not something that was fixed by Joe's changes.

I think something fishy is going on.

joezuntz commented 11 years ago

The difference in the variances of g1 and g2 comes directly from the variances of the geometric components that do the spin weighting, called _sin and _cos in the code - because we are in a square box there is more area with psi_k ~ 45 degrees than zero or 90 degrees.

I'm not sure if this means there is a better way to generate the fields that doesn't have this problem, or if the answer is just to generate at higher resolution and then downsample. I will have a think and ask around.

rmandelb commented 11 years ago

Is this a statement that the variances of g1 and g2 should be different if you make P_E = nonzero and P_B = zero (or vice versa) within a square box? Hmmm, now I wonder if it's worth checking what if I select only the grid points within a circle and compute the variance from them, after generating the shears within the square grid as usual. I have to think about this more, will try to do so tonight once I'm back at the hotel...

On Jan 8, 2013, at 7:33 AM, joezuntz notifications@github.com wrote:

The difference in the variances of g1 and g2 comes directly from the variances of the geometric components that do the spin weighting, called _sin and _cos in the code - because we are in a square box there is more area with psi_k ~ 45 degrees than zero or 90 degrees.

I'm not sure if this means there is a better way to generate the fields that doesn't have this problem, or if the answer is just to generate at higher resolution and then downsample. I will have a think and ask around.

— Reply to this email directly or view it on GitHub.

Rachel Mandelbaum rmandelb@andrew.cmu.edu http://www.andrew.cmu.edu/~rmandelb/

rmandelb commented 11 years ago

No difference if I use only those within a circle, and now that I think of it that test was probably silly.

joezuntz commented 11 years ago

I think a better test may be to generate a larger image, but with no power on the larger scales, and then cut out a region in the center to look at. Or possibly the equivalent in Fourier space.

I've asked a couple of smart people about this with no luck! The problem is that most people I know have only ever done these transforms on the sphere!

Joe

On 11 Jan 2013, at 06:32, rmandelb wrote:

No difference if I use only those within a circle, and now that I think of it that test was probably silly.

— Reply to this email directly or view it on GitHub.

rmandelb commented 11 years ago

Hmmm. In the meantime, I did some more tests to try to understand something that had been confusing me and that I think I now understand.

When I use Chris Hirata's SHT code to generate shears from P(k)=0.01, for our standard 100x100 grid that is 10x10 deg, I was not getting total shear variance=0.02 (if P_E=P_B=0.01) or 0.01 (if one component had no power), which was the normalization that we had adopted here. Instead I was getting total shear variance around 0.12 (if P_E=P_B=0.01) or 0.06 (if P_E=0.01 and P_B=0 or vice versa).

I asked him about this earlier this week, and he said he thought there should be a factor of ln(kmax/kmin)=ln(100) in there, possibly with other numerical factors. To test this out, I made the grid 50x50 but with the same grid spacing, so that kmax was preserved but kmin became a factor of 2 smaller. In this case, if Chris is right, we expect the shear variance to go down by ln(50)/ln(100). I did this test with a bunch of realizations to beat down noise, and this is exactly what happens.

So the picture I'm getting is that the normalization in his code is such that shear variance propto \int P(k) dlnk (integrated over the range of k actually used, and with some number in front).

I believe this is also one that Gary uses, because I recall him saying to me that if P(k)=const then you should get infnite variance in general, and the only reason we don't is the finite k range.

In contrast our shear variance does not depend on the grid / k range used. Does that actually make sense?

Joe, to respond to something you said:

I think a better test may be to generate a larger image, but with no power on the larger scales,

Is this possibly a statement that we're seeing weird artifacts because we're using an unphysical P(k) that doesn't go to zero on large scales? If so, then I should stop obsessing about this test, no? Instead we should generate shears using a P(k) with something more cosmological, generate shears with the same P(k) with Chris's code, and estimate P(k) from both of the sets of shears to amke sure that output P(k) = input.

Rachel

and then cut out a region in the center to look at. Or possibly the equivalent in Fourier space.

I've asked a couple of smart people about this with no luck! The problem is that most people I know have only ever done these transforms on the sphere!

Joe

On 11 Jan 2013, at 06:32, rmandelb wrote:

No difference if I use only those within a circle, and now that I think of it that test was probably silly.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

Rachel Mandelbaum rmandelb@andrew.cmu.edu http://www.andrew.cmu.edu/~rmandelb/

joezuntz commented 11 years ago

I think a better test may be to generate a larger image, but with no power on the larger scales,

Is this possibly a statement that we're seeing weird artifacts because we're using an unphysical P(k) that doesn't go to zero on large scales?

I don't think it is, I'm afraid - no. It seems to happen because of power on small scales having different variances (since that's where the variance in the spin-weights is different). Though I think the test you suggest is also useful.

rmandelb commented 11 years ago

Joe, I have some really basic questions about things about the code that I should have noticed before. So I feel a bit dumb asking this now, but I really don't understand certain things:

PowerSpectrumRealizer defines a grid in k such that, for example, if I tell it that the grid spacing is 1 and there are 100 grid points, then the min, max values of k_x and k_y are 0 and 0.5. I don't understand this. Naively I would think that if grid spacing is 1, this defines the maximum k_x as being 2pi/1=2pi. Likewise, I would think the minimum k_x should be 2pi/(1*100). So I don't see why:

(a) the minimum k is always zero (b) the k spacing is defined the way it is.

What am I missing?

rmandelb commented 11 years ago

Also pinging @barnabytprowe since you might be thinking about the concerns above.

rmandelb commented 11 years ago

And sorry for being annoying, but I thought of another thing: why is the x direction treated differently from the y direction?

rmandelb commented 11 years ago

Oy. I talked to Mike and he reminded me: the x direction being treated differently is because we're just not making part of the array due to symmetry. Also, I see that the delta(k) corresponds to the box scale, it's 1/(box scale) rather than 2pi/(box scale). I need to be careful here as I have cosmological inputs in terms of ell and those 2pi's are tricky...

As for including the k=0 mode, which is cosmologically zero, is this just done for convenience in array conventions? Or is the idea that our little patch might be sitting on top of some large-wavelength density fluctuation?

joezuntz commented 11 years ago

I've spoken with some CMB polarization people on this issue (Jo Dunkley and other ACT people), and they agree that the different variances in g1 and g2 is probably a feature rather than a bug! In most contexts it goes away because there is a beam that cuts off in Fourier space the outer area that is causing the difference. I guess this equivalent here is smoothing the field by some scale at below which we do not want power, i.e. gently cutting off the P(k).

The k=0 is just for convenience - if it's easier we can set it to zero all the time.

barnabytprowe commented 11 years ago

Joe, are you coming into London on any of the next few days? I'd be grateful to go through some of this stuff with you briefly!

joezuntz commented 11 years ago

Barney - yep, I'll be in tomorrow. We can chat then.

rmandelb commented 11 years ago

Should be possible to test this by smoothing P(k) and seeing if it goes away. I'll check it out.

As for k=0, I think I will change it to be set to zero all the time. That should be the case for any physical shear power spectrum, and it will make things easier for the interpolation of a tabulated P(k) if we don't have to have it defined at k=0. I can do this on #305 since I'm trying to wrap up the work on that branch, anyway.

I think the only remaining confusion is the k=1/x vs. 2pi/x, which I think I understood before but have now forgotten. Can you remind me? (This is also relevant for making sure we can take input C_ell vs. ell from cosmological power spectrum calculators and properly use it, taking care of all units issues and so on.)

rmjarvis commented 11 years ago

FYI, #305 has most of the solutions for items 3,4,6. For (3), we still require the user to specify the grid size and worry about the effect of the interpolation on the effective power spectrum. So that could be an item still. But otherwise, I don't think there are any issues with using the PowerSpectrum on non-gridded positions. For (4), I didn't use eval_sbinterpolatedimage, so I think it's just obsolete and can be removed. (That calculation is nowhere close to a tall pole in terms of optimization.) Finally, I think I completely implemented the unit conversions (6).

rmandelb commented 11 years ago

I have discovered an incredibly useful way to get stuff done on this issue! If I wait long enough, Mike will do it elsewhere. ;)

And now for the real status update, listing the remaining action items based on what Mike posted above (plus discussion elsewhere) and clearly marking what we need for GREAT3:

[GREAT3] We still need the more thorough testing of the power spectrum functionality overall, including the funny variance behavior from the pull request on the initial version of the lensing engine. Since we found that Chris Hirata's SHT-based code doesn't have this variance behavior, I am going to check whether it goes away if we smooth P(k) as Joe suggested above. Otherwise, it may just be a feature rather than a bug. Other things to check include proper scaling with k for non-flat power spectra, and the effect of interpolants when using non-gridded shears.
[GREAT3] We need a better set of unit tests for the power spectrum functionality, as discussed on #305.
I had discussed having the code automatically choose a grid based on the random galaxy positions that are provided by the user, instead of forcing the user to construct shears on a grid themselves. while I still think this would be nice functionality, it isn't needed for GREAT3, so I think that it should be lower priority. After the above work on the effects of interpolants (first point above), we will at least be able to make more quantitative statements for the user about the effect of interpolation, so that will be helpful. Once that is clearly documented, I may want to make a new (lower priority) issue for this work, and complete the more urgent things on this branch.
I'll remove the eval_sbinterpolatedimage function, since it's no longer used.
[GREAT3] I want to compare normalization conventions with those in Chris's code and the GREAT10 power spectrum code. This is necessary for GREAT3, in the sense that I want to make sure the convention we're using is producing reasonable shears for inputs from iCosmo. Depending on the findings here, we may change the normalization convention, and/or make a way of converting between them.

I remain confused about the seemingly missing factor of 2pi in the k definition.

rmandelb commented 11 years ago

I made some headway on understanding Chris's shear variance normalization, and in doing so I think I decided that our normalization doesn't make sense. But let me explain, with some equations, in case I've gotten something wrong. (I managed to confuse myself while doing this, but I think it makes sense in the end...)

I was computing the variance of all the shear values, which I believe is essentially just the shear correlation function evaluated at zero separation. So, we can write

shear variance = xi_+ (theta=0) = (1/2pi)^2 \int 2 pi k dk Delta^2(k) J_0(0)

My notation, by the way, is that Delta^2 are essentially the C_ell from a cosmology calculator, reserving the symbol P for the quantity that people more typically refer to as the shear power spectra, i.e., k^2 Delta^2(k)/2pi.

Anyway, since J_0(0)=1, this becomes

shear variance = xi_+(theta=0) = (1/2pi) \int k dk Delta^2(k)

I've been deliberately vague about limits of integration here, but it should be from our kmin to our kmax. In general, for a square grid we define kmin according to the largest possible separation between grid points, i.e.,2pi/(sqrt(2)L) (L is size of grid) and kmax should be defined by the minimum possible separation, 2pi/d (d is spacing between grid points).

In our P(k)=constant scenario, let's call it P0, we have Delta^2 = 2 pi P0 / k^2.

So,

shear variance = P0 \int dk/k = P0 ln(kmax/kmin) = P0 ln(sqrt(2) N)

where N is the number of grid points per dimension, i.e. for our 100x100 grid we have N=100.

In this case, we predict shear variance on our 100x100 grid that is around 5 P0. And indeed, that's almost exactly what I find with Chris's code...

It seems to me that this math is quite reasonable and is the normalization convention that we should adopt (rather than our current one which gives shear variance = constant regardless of grid size setup), but comments/questions/suggestions are welcome. It is also, I believe, closer to the one suggested by Gary (who didn't say it explicitly, but commented that for constant shear power we should formally have infinite shear variance, which is the case for the equation above).

rmandelb commented 11 years ago

I've done some more tests, with some really useful results and some really confusing results. Due to the difficulty in showing equations on GitHub, I just pushed a set of latex'ed notes that show all the equations above, plus the tests that I have done / planned.

rmandelb commented 11 years ago

I didn't say this explicitly in my comment above, so let me just add that at this point I think I could benefit from someone looking this over - maybe @joezuntz , @barnabytprowe , or @rmjarvis ? Don't worry about the math on the comment in this issue page, it's kind of hard to read; I think the latex file is clearer and more explanatory. The initial commit of that latex file had some math errors which I fixed in later versions, so you should look at the most up to date version on this branch:

https://github.com/GalSim-developers/GalSim/blob/%23248/devel/modules/lensing_engine.tex

barnabytprowe commented 11 years ago

Hi Rachel, I'm off out for some food but will take a look later tonight!

barnabytprowe commented 11 years ago

We're chatting on the phone later today so I'll converse in more detail then, but it seems to me that one thing straight away is the definitions of P, Del^2 and xi_+/-.

I'm used to Del^2 being the dimenionless quantity, and P being dimension-err-ful. This makes sense to me: if xi_+/- is dimensionless, which we can probably all agree on, then to calculate it's Fourier transform you need to integrate over two spatial dimensions. This is the way round that is commonly used for galaxy correlation functions too, e.g. http://galaxycorrelations.pbworks.com/w/page/12447428/Galaxy%20Auto-Correlations
I think it's also the way the lensing engine code is working. We are identifying P with the Fourier conjugate of xi_+/-: in the code we do the 2D Fourier transform in x and y since we're gridded, but when writing this analytically we use the circular symmetry of our problem to reduce the FT to a Fourier-Bessel or Hankel transform as per equation 3. Or rather this would be equation 3 in the notes, but in our PS code Del^2 is instead replaced by P. And in fact I do think that is how commonly this equation is thought of, e.g. equation 2 here: http://arxiv.org/abs/astro-ph/0206182
So given the results of the first test it looks like CH's code is taking Del^2 as the input, whereas ours is taking P, and moreover in the constant power test interpreting it as dimensionful and taking the average dimensionful P(k) power across the (inverse area) k range in question, giving just P0. This would actually be my expected behaviour for P(k) if I'm honest, which is why I didn't flag this up before, but I've also not thought properly about the extra subtlety in our discrete representation of these continuous integrals. There's a nice gif (top right of the page) here showing how periodicity means that our gridded representation of P(k)=P0 actually extends to all k by repetition: http://en.wikipedia.org/wiki/Discrete_Fourier_transform.

Clearly there are some notational inconsistencies in the literature that only add to the confusion... We need a good textbook :)

And I need to think more about the second test!

rmandelb commented 11 years ago

Yes, to be honest I got seriously confused by different uses of P and so on (exacerbated by the fact that I had to reinterpret a few papers' equations from full-sky to flat-sky notation, plus the fact that Delta^2 means something for 3d power spectra).

Let's talk more on the phone, but I think I may have been misinterpreting something in our code...

On Jan 25, 2013, at 5:33 AM, Barnaby Rowe notifications@github.com wrote:

We're chatting on the phone later today so I'll converse in more detail then, but it seems to me that one thing straight away is the definitions of P, Del^2 and xi_+/-.

I'm used to Del^2 being the dimenionless quantity, and P being dimension-err-ful. This makes sense to me: if xi_+/- is dimensionless, which we can probably all agree on, then to calculate it's Fourier transform you need to integrate over two spatial dimensions. This is the way round that is commonly used for galaxy correlation functions too, e.g. http://galaxycorrelations.pbworks.com/w/page/12447428/Galaxy%20Auto-Correlations

I think it's also the way the lensing engine code is working. We are identifying P with the Fourier conjugate of xi_+/-: in the code we do the 2D Fourier transform in x and y since we're gridded, but when writing this analytically we use the circular symmetry of our problem to reduce the FT to a Fourier-Bessel or Hankel transform as per equation 3. Or rather this would be equation 3 in the notes, but in our PS code Del^2 is instead replaced by P. And in fact I do think that is how commonly this equation is thought of, e.g. equation 2 here: http://arxiv.org/abs/astro-ph/0206182

So given the results of the first test it looks like CH's code is taking Del^2 as the input, whereas ours is taking P, and moreover in the constant power test interpreting it as dimensionful and taking the average dimensionful P(k) power across the (inverse area) k range in question, giving just P0. This would actually be my expected behaviour for P(k) if I'm honest, which is why I didn't flag this up before, but I've also not thought properly about the extra subtlety in our discrete representation of these continuous integrals. There's a nice gif (top right of the page) here showing how periodicity means that our gridded representation of P(k)=P0 actually extends to all k by repetition: http://en.wikipedia.org/wiki/Discrete_Fourier_transform.

Clearly there are some notational inconsistencies in the literature that only add to the confusion... We need a good textbook :)

And I need to think more about the second test!

— Reply to this email directly or view it on GitHub.

Rachel Mandelbaum rmandelb@andrew.cmu.edu http://www.andrew.cmu.edu/~rmandelb/

rmandelb commented 11 years ago

Okay, I have revamped that "theory" section and indeed the entire document to use the more standard notation. (Will have to update the documentation in lensing engine itself later…)

I still don't understand the GalSim behavior, but with the updated documentation, it might be more clear exactly what the question is. Would you be willing to take a look at the version I just pushed?

rmandelb commented 11 years ago

(was asking @barnabytprowe in case that wasn't obvious, though anyone is welcome to comment)

barnabytprowe commented 11 years ago

Hi Rachel, I've pushed some thoughts, hope these are helpful!

rmandelb commented 11 years ago

Thanks, these are helpful, but I've just pushed a version that clarifies a few things… (as italicized comments interspersed throughout your explanation)

On Jan 27, 2013, at 10:33 AM, Barnaby Rowe notifications@github.com wrote:

Hi Rachel, I've pushed some thoughts, hope these are helpful!

— Reply to this email directly or view it on GitHub.

Rachel Mandelbaum rmandelb@andrew.cmu.edu http://www.andrew.cmu.edu/~rmandelb/

barnabytprowe commented 11 years ago

Definitely change the stepsize definition to be consistent with the rest of GalSim, and please feel free in future to make such clearly beneficial changes in future -- on a development branch that will in future be fully reviewed via PR -- with the impunity that such a sensible change merits.
For the P(k) = 2pi P0/k^2 test, the current PS code does not have P(k < k_min) = 0, since the power spectrum is represented on a rectilinear grid it's more like P(|kx|, |ky| < k_min) = 0: this makes the relevant reference test integral a bit different.
The fact that the P(k) = 2pi P0 /k^2 test is so steep at small k means the fact that it is being approximated only by gridded, delta function samples very important. I don't think we expect the continuous integral to be well-approximated by delta functions separated by k_min at low k.
I'm a little surprised that the P(kx=ky=0) setting hasn't changed anything in the unit tests.

However, a more general meta-question is do we actually want the lensing engine to be behaving this way? If someone gives a k and P(k) that are actually the and C from some cosmology calculator, don’t they usually expect that the k is zero outside of that range?

I don't know how you would do it another way: the spaces are very different. In cosmology you have integer-ell spherical harmonics on the bounded surface of a sphere. On a square image you have an infinite periodic series across continuous k space. To formally set all k to zero outside the range of provided inputs would need an infinitely large grid... I think it makes more sense to mention this in the documentation and expect the user to appreciate the differences in the discrete vs flat sky approximation.

rmandelb commented 11 years ago

Definitely change the stepsize definition to be consistent with the rest of GalSim, and please feel free in future to make such clearly beneficial changes in future -- on a development branch that will in future be fully reviewed via PR -- with the impunity that such a sensible change merits.

Since 305 is already an open PR (for which nobody has commented on the code directly, though Reiko ran the unit tests) on the lensing engine, I think I am going to make these changes directly on there, in addition to some changes related to the units. And then I will need to update the unit tests, too, but I think this can be done within a day or two, and would be sensible to accomplish ASAP rather than having another separate PR for this point. For the P(k) = 2pi P0/k^2 test, the current PS code does not have P(k < k_min) = 0, since the power spectrum is represented on a rectilinear grid it's more like P(|kx|, |ky| < k_min) = 0: this makes the relevant reference test integral a bit different.

The fact that the P(k) = 2pi P0 /k^2 test is so steep at small k means the fact that it is being approximated only by gridded, delta function samples very important. I don't think we expect the continuous integral to be well-approximated by delta functions separated by k_min at low k.

I'm a little surprised that the P(kx=ky=0) setting hasn't changed anything in the unit tests.

However, a more general meta-question is do we actually want the lensing engine to be behaving this way? If someone gives a k and P(k) that are actually the and C from some cosmology calculator, don’t they usually expect that the k is zero outside of that range?

I don't know how you would do it another way: the spaces are very different. In cosmology you have integer-ell spherical harmonics on the bounded surface of a sphere. On a square image you have an infinite periodic series across continuous k space. To formally set all k to zero outside the range of provided inputs would need an infinitely large grid... I think it makes more sense to mention this in the documentation and expect the user to appreciate the differences in the discrete, flat sky approximation. I think possibly another way (?) to say this is to say that I should not be using the variance as a metric, because it depends on the definition of the PS at all values of k, whereas for the lensing engine what we really care about is "do we get the PS right within the range of k values represented by our grid?". Of course it would still be good to understand (at least roughly) the behavior of the variances, but as you suggest, we should clearly flag that in the dox as something that depends on what you assume about the PS outside of our relevant k range, and therefore the variances cannot be used to compare against other codes or the idealized case without thinking carefully about the impact of the DFT approach.

barnabytprowe commented 11 years ago

I think possibly another way (?) to say this is to say that I should not be using the variance as a metric, because it depends on the definition of the PS at all values of k, whereas for the lensing engine what we really care about is "do we get the PS right within the range of k values represented by our grid?". Of course it would still be good to understand (at least roughly) the behavior of the variances, but as you suggest, we should clearly flag that in the dox as something that depends on what you assume about the PS outside of our relevant k range, and therefore the variances cannot be used to compare against other codes or the idealized case without thinking carefully about the impact of the DFT approach.

I think you might be right there, in fact!

barnabytprowe commented 11 years ago

Some more annoying thoughts, but I don't think relevant for the tests...

After all this, I'm now having doubts about the fixing of P(k=0)=0: do we always want this? It forces the condition that the total integrated correlation is zero in the output: i.e. that positive correlations are always balanced by negative anticorrelations within the image.

While this is true in the Universe, and makes sense on a bounded sphere, it doesn't necessarily make sense if we're only interested in generating realizations of power in sub-region images that cover only certain scales. The example I'm thinking of is a cosmological correlation function: these are typically positive on small scales, which must be compensated I guess by being (unmeasurably) negative on (much) larger ones. But if we're not simulating images out to scales where these competing effects balance out, then we expect to have variance even in the lowest mode, i.e. across the whole image: cosmic variance if you will.

I know that this is starting to make my head ache...

So I think one the problem might be that the modes the PS code calls k=0 when deciding what to apply to it's image is actually not necessarily what the user calls k=0, which is k = really zero, i.e. the zero mode in the larger space onto which the image is merely a finite window.

This also makes sense in the case of the flat power spectrum: it's not really white noise if you are fixing the total variance to zero, so that total flux is zero. In true white noise the mean in a finite window is also a random number.

I'm wondering what the best thing to do about this is...

One option would be that what the PS code assigns to being the k=0 mode (overall variance in image mean value) integrated effect (would need to think about the correct expression for this integral, ugh) of all scales from k=0 up to what is being called kmin in these notes, which may or may not be non-zero in the user function. This is essentially the "cosmic variance" term for our small image which is in fact a window of samples onto the flat, infinite 2D plane. In the case of the flat power spectrum this integrated effect would need to give a value of P0 to the P(k=0) term. Ugh.
If this seems too painful, and I think it might, then we simply have to instruct the user that any power in their user supplied function below 2 pi / L is simply ignored: this is equivalent to us saying that we set k=0 in our DFT representation of the power spectrum, but is more explicit.

barnabytprowe commented 11 years ago

Sorry, more thoughts!

There are two things that really matter in all this:

In the first conceptual step, we're taking a finite sphere and mapping it onto the real 2D plane: this turns integer valued C_ells into the function over continuous space P(k).
In the second conceptual step, we create a mapping from the entire real 2D plane into finite numbers of discrete samples by making it periodic (of period equal to our image extent) and by only sampling it at our grid spacing.

This second step allows us to generate approximate realizations of the sorts of fields we expect from the first step only if the characteristic correlation length is small compared to half the image extent: I believe this is equivalent to saying that we need to be well sampled in k space. This could well be the issue with the first, constant Del(k), test.

For the second test it may be that Chris' code is not set up to be able to generate white noise on all scales, since he doesn't have the ability to cover infinite space using the trick of periodicity: he actually has to sum contributions up to a finite kmax.

Fingers crossed for the Gaussian test cases!

barnabytprowe commented 11 years ago

For the second test it may be that Chris' code is not set up to be able to generate white noise on all scales, since he doesn't have the ability to cover infinite space using the trick of periodicity: he actually has to sum contributions up to a finite kmax.

I still don't understand how the variance didn't increase as kmax was increased though, as per your test. Maybe it's just that you are wholly right that the variance is not a good measure, and maybe again it is due to the rapid oscillations of the Bessel function (or spherical harmonic transform equivalent) at large k for as theta -> 0.

rmjarvis commented 11 years ago

Regarding the k=0 stuff, one option would be to access P(k=0) inside a try block, and if the function supplied by the user is able to produce a valid answer, use that. Otherwise (e.g. if it is interpolated with logP, logk and raises an exception), set P(k=0) = 0 manually.

The only problem is that we have occasionally had problems properly propagating exceptions up from the C++ layer on certain machines, so this might not be completely robust. It might be worth having the python layer LookupTable check if the input k value is valid given the interpolation (if it doesn't already -- I forget whether we added that check).

rmandelb commented 11 years ago

Some thoughts on this k=0 mode: In previous versions of the lensing engine, or indeed in the version on master, this mode is not forced to be zero. And I do think your physical understanding of this is right: if we have the full sky, any reasonable shear power spectrum must have P(k=0)=0, but this is not at all true when dealing with small patches, due to sampling variance. I think that to mimic the effects of sampling variance we would indeed need to do something like you say (assigning to our P(k=0) some value that comes from averaging in some way over all k<kmin - would have to think about this more quantitatively - but in any case, since physically realistic P(k) go to zero at k=0, this average over all k<kmin will also go to zero in the limit of large area since the absolute k range being covered will get smaller... so I think you have the right physical idea but I didn't work out the math).

As Mike suggests, we could have the PS code check explicitly whether the power function is defined at k=0, and only force it to zero if it's not defined there. (And, Mike, to answer your question: I made the code check that the LookupTable-based power function is defined at all k that are being used and, if not, raise a clear exception at the python level, to avoid that issue with non-propagated C++-level exceptions.) However, as I mentioned, at least in the P=const case it didn't matter whether I set the PS to zero at k=0 or not. Indeed, the unit tests did not start failing, so apparently this is not leading to differences in behavior anyway, even for non-const PS? I'm not quite sure I understand that, actually... I mean, the unit tests are for shear values, not just variance, for two different possible power spectra, and they are completely unchanged!

Moreover, if our metric is going to be checking the accuracy of PS reconstruction for kmin<k<kmax, then power at smaller k / larger scales is essentially irrelevant, isn't it? in that case, setting it to zero is fine for our own testing purposes, and we can simply document this more clearly for other users who want to know what is our convention for handling large-scale power. Though, I guess if we use a correlation function metric this could matter, since power in some large-scale mode could coherently shift the correlation func up/down...?

barnabytprowe commented 11 years ago

However, as I mentioned, at least in the P=const case it didn't matter whether I set the PS to zero at k=0 or not. Indeed, the unit tests did not start failing, so apparently this is not leading to differences in behavior anyway, even for non-const PS? I'm not quite sure I understand that, actually... I mean, the unit tests are for shear values, not just variance, for two different possible power spectra, and they are completely unchanged!

I think this is telling us that for a large enough grid in k space, the difference between P=const for all k and P=const for all k !=0 only makes a very slight difference in the random field generated. It's still very nearly a delta function, it's just had a single mode filtered out of it, and a mode which was of equal but no greater importance than all the N*N-1 others. If N=10 only, the resulting field is still 99% the same, plausibly enough to pass tests.

If it helps, and I hope it does, I'm about to start (today I hope) using this power spectrum generation code to simulate power spectra submissions and compare them to the truth...

rmandelb commented 11 years ago

Hi guys - I just committed a version of lensing_engine.tex that includes tests of the PS, with plots. In short, I think we have normalization issues but I'm not sure where they come from.

rmandelb commented 11 years ago

Per e-mail discussion, I think I do understand this now: we were accidentally generating the power with something dimensionful (units of angles^2), and required a division by the grid spacing^2. In arcsec, that's 720^2 for the test grid I was using, so can account for many orders of magnitude That takes care of nearly all of the problem for at least one test case, need to study this further... will try to push some plots over the weekend.

rmandelb commented 11 years ago

Pushed a version of notes showing how, with this fix in place, I can understand /explain nearly all of the test cases (including the cosmological power spectrum test).

reikonakajima commented 11 years ago

I read your notes, and left some comments (scattered here and there--now I see why PRs are easier to comment on). Thanks for explaining the details in the doc.

barnabytprowe commented 11 years ago

Hey Rachel, thanks so much for all of this. I think I finally have an end-to-end mathematical description of what we're doing, I just need to LaTeX it. I'm guessing you don't mind if I put this in the notes, or would you prefer a separate document?

rmandelb commented 11 years ago

Please go ahead and put it into the notes!

On Feb 6, 2013, at 10:11 AM, Barnaby Rowe notifications@github.com wrote:

Hey Rachel, thanks so much for all of this. I think I finally have an end-to-end mathematical description of what we're doing, I just need to LaTeX it. I'm guessing you don't mind if I put this in the notes, or would you prefer a separate document?

— Reply to this email directly or view it on GitHub..

Rachel Mandelbaum rmandelb@andrew.cmu.edu http://www.andrew.cmu.edu/~rmandelb/

barnabytprowe commented 11 years ago

Hi Rachel. Sorry for taking a while on this, I've added some extra pages onto the document (page 8 onwards). I've got well over half way in the writeup of my pen+paper derivation of what we're doing, but I think it already firmly motivates the division by grid spacing you are doing.

I haven't gotten the extension to N-periodic samples latexed-down yet, will try to do that tomorrow. However, my working suggests that there might be a couple of factors of N and 2 pi being missed in the guts of the code too, so it will be worth taking a look to try and understand that.

Finally, the flat power P(k) = P0 problem. I have to confess, after going through all the working above in depth I have to retreat from my earlier position: it is not a good thing that this is currently working without grid spacing normalization. It doesn't check out dimensionally (P0 having units angle^2). I'm trying to formulate the best way to express this, but I think that we need to say something in the dox along the lines of...

If you are trying to to simulate a flat power spectrum you may find different behaviour
to what you are expecting.  These power spectra are assumed to have dimensions
angle^2.  In order to represent continuous power spectra with discrete samples, and
to use these to generate fields with a dimensionless correlation function by Discrete
Fourier Transform, we must sum dimensionless impulse terms: these consist of the
power at a given frequency multiplied by an area in Fourier space (in inverse angles^2)
determined by the chosen grid spacing for the final gridded shear output.

Therefore, if you wish to generate a field with a white spectrum and variance given by
the sum of the E and B mode power spectra, you should set these functions to each
return half the total variance required for all elements of any input `k`, and set the grid
spacing to `1.` (in your chosen units) when calling `buildGriddedShear`. To understand
more, see (BLAH.PDF).

Does that make sense?? (It's late so I worry it may not...!)

rmandelb commented 11 years ago

Hi Rachel. Sorry for taking a while on this, I've added some extra pages onto the document (page 8 onwards).

Goodness! You've been hard at work. I feel bad for slacking on this one. And I will probably not be able to absorb this tonight since I'm tired - might be a few days given all I have going on here. I need to sit down and calculate, I can never fully internalize these things until I've done them myself, though I will use your steps that you describe as a guide for where to start. In particular the different FT conventions was not something I had realized. I've got well over half way in the writeup of my pen+paper derivation of what we're doing, but I think it already firmly motivates the division by grid spacing you are doing.

That's great. A derivation is better than "the units were wrong so I need to divide by something with units of angle, and if I divide by grid spacing then all the variances and P(k) come out right, as if by magic". (Which was my approach before. Kind of pathetic, really, though the first part of the chain of logic -- i.e., "units are wrong" -- was okay.) I haven't gotten the extension to N-periodic samples down yet, will try to do that tomorrow. However, my working suggests that there might be a couple of factors of N and 2 pi being missed in the guts of the code too, so it will be worth taking a look to try and understand that.

We would need them to be wrong in a way that doesn't drastically change the results for P(k) in section 5… is that possible? I have a hard time seeing how that would work considering that neither N nor 2pi is near 1. Finally, the flat power P(k) = P0 problem. I have to confess, after going through all the working above in depth I have to retreat from my earlier position: it is not a good thing that this is currently working without grid spacing normalization.

Um... I don't understand what you mean. The P(k)=P0 case only works in the current version, with grid spacing normalization. In the old version it didn't work. See section 4.2 for details. Maybe I didn't explain this clearly enough?

GalSim-developers / GalSim

Further development / testing of lensing engine #248