damonge / CoLoRe

CoLoRe - Cosmological Lofty Realizations
GNU General Public License v3.0
18 stars 13 forks source link

Some quasars have Dec=NaN #25

Closed andreufont closed 6 years ago

andreufont commented 6 years ago

In some Lya runs at NERSC a tiny fraction of the outputs ends up with Dec=NaN.

For the precise example that I'm looking at, this only happens for quasars with a very high redshift, z>3.55 (the maximum is 3.8), and it happens regardless of value of RA. Actually, this happens for roughly 50% of quasars with z>3.55.

You can find an example file that crashes, and code to reproduce the code, in this igmhub repository that we are working on with James Farr: https://github.com/igmhub/LyaCoLoRe

In particular, you can take a look at: https://github.com/igmhub/LyaCoLoRe/blob/master/example_scripts/issue_dec_colore.ipynb

andreufont commented 6 years ago

I just ran a large N=8192^3 simulation, and the fraction of quasars with Dec=NaN is much worse. See attached figure with fraction as a fraction of redshift, for all 256 nodes. issue_bad_dec

jfarr03 commented 6 years ago

Looking into this further, could the function 'get_random_angles', as defined in src/common.c line 285 be causing problems?

It is not entirely clear how this function works to me, so help in understanding what it does and how it works would be very helpful! In particular, what is the origin of the factor of 0.57 that is hard-coded into lines 290 and 291?

damonge commented 6 years ago

This is a quick method to generate a random point within a given healpix pixel. The factor 0.57 was found experimentally to work well for a wide range of healpix resolutions. It could indeed be that this procedure is the one giving rise to NaNs, I'll investigate.

@andreufont do you get the NaNs only when you include skewers or also when you just ask for quasar positions?

andreufont commented 6 years ago

@damonge - good question. I'll try right now and come back to you.

About the extra randomness, it is quite dodgy. You basically add some extra perturbation in RA-Dec, but only within a square in RA-Dec around the center of the HEALPix sub-pixel, so part of the pixel will never be covered.

@jfarr03 - you could try to come up with better ways to choose a random position whithin a HEALPix pixel, I don't really know at this point what is best... I can see two things:

I would probably go for the second one... or a combination of both. Open to hear other ideas!

damonge commented 6 years ago

Yes, this choice was made to minimize computational time. Sampling inside a healpix pixel is just a pain. I thought about implementing the second one, but gave up because sampling became a bottleneck. Honestly, I wouldn't put too much work on this, since this will be deprecated after the revamp I'm planning (basically getting rid of healpixels altogether for the sampling of sources).

So I'm gonna check if this is the source of the NaNs (seems like a good candidate) and patch it, but I wouldn't recommend wasting more time on it.

andreufont commented 6 years ago

Very good point. If this was the source of conflict, we could just decrease the area that we subsample to make sure that we are still within the pixel, or try to find bad values of Dec at this point.

damonge commented 6 years ago

Yes, this is my plan. Looking into it now.

damonge commented 6 years ago

OK, so the problem is that there's a maximum Nside=16384 hard-coded in healpix, so if you query angles beyond that, you get nonsense. This will happen as soon as your cartesian resolution is high enough or as you go to higher redshifts (where you need to make the spherical voxels smaller in angle to keep up with the cartesian resolution). Well spotted @jfarr03 !

@andreufont : I'll patch this up now in a dirty way for the time being. Let me know if you still encounter problems afterwards. I'll let you know when it's done.

andreufont commented 6 years ago

That makes sense. Thanks for looking into this!

andreufont commented 6 years ago

@damonge - I just checked that quasars have normal values of Dec when we do not ask for skewers, and only write the catalog. What I find a bit strange... why wouldn't the quasar positions be the same, regardless of whether we want skewers or not?

damonge commented 6 years ago

Mmh, that's weird. So you re-ran the same sim without skewers and got no dec=Nan?

damonge commented 6 years ago

@andreufont @jfarr03 : OK, this has now been patched. I've checked that it compiles, but nothing else. Could you please check if the NaNs persists, and whether the issues with the distribution of healpix pixels over nodes gets solved too?

I'll start working on the revamp, and get back to this if problems persist.

andreufont commented 6 years ago

@damonge - I confirm that I just ran exactly the same run twice, one with skewers, one without, and one of them has Dec=NaN while to other one does not. So it looks like the code either uses different functions to get quasar positions, or to write the files...

I will "git pull" the skewers branch and start again.

damonge commented 6 years ago

@andreufont : oh, yes, of course. If no skewers are requested, then the code uses cartesian voxels the whole time, so healpix is just not used at all. Then I'm hopeful that this fix will solve these issues (at least temporarily)

andreufont commented 6 years ago

Great, running test on new version now.

andreufont commented 6 years ago

Eureka! No Dec=NaN in the new run :-)

I'll test the other issues now.

andreufont commented 6 years ago

And the issue with the different HEALPix in each node is also gone!

So I believe we can close both this ticket and #27. Oh, and #28, since that one was not an issue to start with.

damonge commented 6 years ago

awesome, closed!