Interpolation causing recovered gaussian field to have non-zero mean, reduced variance

jfarr03 commented 6 years ago

As discussed in #26, @andreufont and I previously noticed that the mean of the delta skewers was non-zero. This seemed to be fixed in the revamp branch, but there was still a (much smaller) positive mean. In discussion, we worked out that this was due to selection bias from the cells close to the quasars. Indeed, the mean density tends to zero with distance from the QSOs (mean density in blue, variance in orange, plotted in restframe wavelength):

However, in investigating this, we also noticed that the mean of the gaussian field does not seem to behave in the same way, tending to a value of ~0.04. It also seems to have a roughly constant variance of ~1.3, whereas the value of sigma_G quoted is 1.19 and so a variance of 1.41 would be expected:

This gaussian field was recovered from the density skewers using the value of sigma_G that is included as a header in the CoLoRe output files and equation 9 in the documentation.

After a bit of fiddling around, we tried hacking the interpolation method for the skewers, changing it to NPG. This seemed to fix the problem (both the mean and variance are now as would be expected):

This seems to suggest that reconstructing the Gaussian field from interpolation of the density field is not a feasible option, as using NPG is not well-motivated physically, and the default scheme does not reproduce the correct distribution.

As such, we were wondering whether obtaining the gaussian field via the following method would be possible?

Generate Gaussian field in the grid
Convert to a lognormal field
Determine the positions of the QSOs
Convert back to the Gaussian field
Interpolate to obtain the skewers
Save the gaussian skewers to disk
Convert the skewers to lognormal
Save the lognormal skewers to disk

Obviously this will increase the time required - at the moment, CoLoRe runs so quickly that this won't be a problem - but hopefully would not increase the memory requirements. @damonge, @slosar: does this seem like a good idea, or can you think of any other options?

andreufont commented 6 years ago

@fjaviersanchez - you might also be interested on this.

A minor note on @jfarr03 comments: we probably do not need CoLoRe to compute and write the lognormal skewers, writing the Gaussian skewers should be enough. If we ever wanted the lognormal skewers, we could compute them ourselves in post-processing, but we usually would like to add extra power to the Gaussian field before doing that.

fjaviersanchez commented 6 years ago

@andreufont @jfarr03 Just a quick question: do you need both the Gaussian and lognormal skewers or are the lognormal skewers enough? You mention that you can go with the Gaussian and do the lognormal transform yourself but, do you use the Gaussian skewers later for anything?

jfarr03 commented 6 years ago

The reason we need the Gaussian skewers is because, with the current system, there isn’t nearly enough small scale variation in the skewers due to the large cell size. Our intentions were to add in extra variations at this scale, but to do this we’d need to add them to the Gaussian field rather than the lognormal one.

andreufont commented 6 years ago

Exactly. @fjaviersanchez , remember that we asked for sigma_G to be stored in the files? It was because we thought that with that we could transform the lognormal field back to the Gaussian field. But we forgot that the lognormal field is interpolated (using CIC by default) to the skewers, and that interpolation breaks this inverse transformation.

It is not something critical, we can continue working with the current skewers for now, but we thought it was important to write this down somewhere for the record.

damonge commented 6 years ago

@andreufont @jfarr03 : OK, I see. Yes, this was one of my worries. Question: let's assume we can do whatever we want with the Gaussian field with no extra memory penalty. Would you like it to be CiC-interpolated into the skewers? If so, that will induce an effective smoothing, which you'll have to take into account when adding small-scale fluctuations.

Now, regarding how to access the Gaussian field, I think the most straightforward way would be to recompute it on the fly when interpolating it. I.e. at every point in the skewer we find the values of d_LN we need for the interpolation, translate them into d_G and then use those. This will incur in extra run time for sure (you need to compute quite a few logarithms), but as long as that's not a problem...

andreufont commented 6 years ago

@damonge - Ideally yes, we would like the Gaussian skewers to use some sort of interpolation, CIC better than NGP. We used NGP here just to test that the interpolation was indeed the reason why was not zero. Yes, the interpolation will add a smoothing, and it will change the variance of the Gaussian field, but not its mean.

I think your suggestion for interpolating the Gaussian field is great, since it means that we only need to compute the Gaussian field for the cells that we care about. We definitely do not care about a bit of extra run time... even if it was x10 slower it would still be peanuts, CoLoRe is just too fast :-)

slosar commented 6 years ago

@damonge The penalty could be much less than you think. Mathematical operations are very cheap if you operate on numbers you already have in the processor (i.e. log takes a few cycles, but fetching a memory location that is not in cache can go into hundreds...) But perhaps you could do it the other way and transform when you generate quasars?

andreufont commented 6 years ago

I really wouldn't worry about it, we are not limited by run time at all. I would do the simplest thing.

jfarr03 commented 6 years ago

@damonge would it be possible to add a flag to CoLoRe so that you can decide whether or not you want to calculate the Gaussian skewers? If the flag was "yes" then you'd end up with Gaussian skewer files, and otherwise you'd get the physical density files, as we get currently. That should be fairly straightforward to implement hopefully?

andreufont commented 6 years ago

We could implement this ourselves and submit a pull request!

damonge commented 6 years ago

As you wish. If you want me to do it (happy to oblige), it'll probably have to wait until the week after next (bit swamped with some other stuff). If you do it yourselves there's two options: a) store also de Gaussian field and interpolate that one. b) Transform the lognormal back to Gaussian on the fly when interpolating.

andreufont commented 6 years ago

@damonge - We'll do it ourselves, and implement option b) above (transform locally to Gaussian on the fly and interpolate).

andreufont commented 6 years ago

@jfarr03 - I said I was going to look at this, but after a failed half an hour I'll walk away from it. Take a look at it if you can. The action happens in beaming.c, we probably need to modify only get_element(), but we need to make sure that none else is using the same function and hoping a different behavior.

jfarr03 commented 6 years ago

@andreufont - Sure, having a look at it now and will make sure to check for any potential problems from other uses of get_element()

damonge commented 4 years ago

@jfarr03 @andreufont : should this still be open?

andreufont commented 4 years ago

Hi @damonge - this is fixed in the pull request that James started, i.e., #36

Once James adds the error message if someone wants Gaussian skewers from a 2LPT run, then we can merge that branch and close this issue.

damonge commented 4 years ago

OK, great!

andreufont commented 4 years ago

Fixed in #36

damonge / CoLoRe

Interpolation causing recovered gaussian field to have non-zero mean, reduced variance #34