Closed andreufont closed 6 years ago
I just ran a large N=8192^3 simulation, and the fraction of quasars with Dec=NaN is much worse. See attached figure with fraction as a fraction of redshift, for all 256 nodes.
Looking into this further, could the function 'get_random_angles', as defined in src/common.c line 285 be causing problems?
It is not entirely clear how this function works to me, so help in understanding what it does and how it works would be very helpful! In particular, what is the origin of the factor of 0.57 that is hard-coded into lines 290 and 291?
This is a quick method to generate a random point within a given healpix pixel. The factor 0.57 was found experimentally to work well for a wide range of healpix resolutions. It could indeed be that this procedure is the one giving rise to NaNs, I'll investigate.
@andreufont do you get the NaNs only when you include skewers or also when you just ask for quasar positions?
@damonge - good question. I'll try right now and come back to you.
About the extra randomness, it is quite dodgy. You basically add some extra perturbation in RA-Dec, but only within a square in RA-Dec around the center of the HEALPix sub-pixel, so part of the pixel will never be covered.
@jfarr03 - you could try to come up with better ways to choose a random position whithin a HEALPix pixel, I don't really know at this point what is best... I can see two things:
I would probably go for the second one... or a combination of both. Open to hear other ideas!
Yes, this choice was made to minimize computational time. Sampling inside a healpix pixel is just a pain. I thought about implementing the second one, but gave up because sampling became a bottleneck. Honestly, I wouldn't put too much work on this, since this will be deprecated after the revamp I'm planning (basically getting rid of healpixels altogether for the sampling of sources).
So I'm gonna check if this is the source of the NaNs (seems like a good candidate) and patch it, but I wouldn't recommend wasting more time on it.
Very good point. If this was the source of conflict, we could just decrease the area that we subsample to make sure that we are still within the pixel, or try to find bad values of Dec at this point.
Yes, this is my plan. Looking into it now.
OK, so the problem is that there's a maximum Nside=16384 hard-coded in healpix, so if you query angles beyond that, you get nonsense. This will happen as soon as your cartesian resolution is high enough or as you go to higher redshifts (where you need to make the spherical voxels smaller in angle to keep up with the cartesian resolution). Well spotted @jfarr03 !
@andreufont : I'll patch this up now in a dirty way for the time being. Let me know if you still encounter problems afterwards. I'll let you know when it's done.
That makes sense. Thanks for looking into this!
@damonge - I just checked that quasars have normal values of Dec when we do not ask for skewers, and only write the catalog. What I find a bit strange... why wouldn't the quasar positions be the same, regardless of whether we want skewers or not?
Mmh, that's weird. So you re-ran the same sim without skewers and got no dec=Nan?
@andreufont @jfarr03 : OK, this has now been patched. I've checked that it compiles, but nothing else. Could you please check if the NaNs persists, and whether the issues with the distribution of healpix pixels over nodes gets solved too?
I'll start working on the revamp, and get back to this if problems persist.
@damonge - I confirm that I just ran exactly the same run twice, one with skewers, one without, and one of them has Dec=NaN while to other one does not. So it looks like the code either uses different functions to get quasar positions, or to write the files...
I will "git pull" the skewers branch and start again.
@andreufont : oh, yes, of course. If no skewers are requested, then the code uses cartesian voxels the whole time, so healpix is just not used at all. Then I'm hopeful that this fix will solve these issues (at least temporarily)
Great, running test on new version now.
Eureka! No Dec=NaN in the new run :-)
I'll test the other issues now.
And the issue with the different HEALPix in each node is also gone!
So I believe we can close both this ticket and #27. Oh, and #28, since that one was not an issue to start with.
awesome, closed!
In some Lya runs at NERSC a tiny fraction of the outputs ends up with Dec=NaN.
For the precise example that I'm looking at, this only happens for quasars with a very high redshift, z>3.55 (the maximum is 3.8), and it happens regardless of value of RA. Actually, this happens for roughly 50% of quasars with z>3.55.
You can find an example file that crashes, and code to reproduce the code, in this igmhub repository that we are working on with James Farr: https://github.com/igmhub/LyaCoLoRe
In particular, you can take a look at: https://github.com/igmhub/LyaCoLoRe/blob/master/example_scripts/issue_dec_colore.ipynb