Use NEST ordering for pointing compression?

Cosmoglobe / Commander

Commander is an Optimal Monte-carlo Markov chAiN Driven EstimatoR which implements fast and efficient end-to-end CMB posterior exploration through Gibbs sampling.

GNU General Public License v3.0

20 stars 14 forks source link

Use NEST ordering for pointing compression? #4

Open mreineck opened 4 years ago

mreineck commented 4 years ago

I'm not absolutely sure, but browsing the code I got the feeling that you are exclusively using Healpix pixel indices in RING scheme. For differencing and Huffman compression it might be quite advantageous to use NESTED scheme instead, since this might lead to a much more concentrated distribution of difference values. Since the (uncompressed) maps themselves will most likely be stored in RING scheme (otherwise there would be problems with the SHTs), this incurs the overhead of doing quite a few nest2ring conversions whenever uncompressing a scan, but that might be an acceptable price to pay. I'm sorry if you have already tried this and it didn't produce any appreciable gain. If so, please feel free to close this!

hke commented 4 years ago

Hi Martin,

That's true, but the number of symbols isn't really a limitation. This could lead to a slightly more compact Huffman tree -- but the decompression is anyway a small cost compared to the full time per sample. And this step is also already memory bus limited, not CPU limited, so I don't think we would gain anything.

Then, even more problematic is the fact that we actually will need the maps in ring format for the SHTs. And convert_ring2nest is a really expensive operation in MPI. So, doing that over and over is not an option. (We discussed MPI-parallelized ud_grade at some point, which is also expensive, for the same reason.) Of course, we could store the actual maps in ring, and only the point centroids in nest, and do just a nest2ring operation while performing P and P^t, but that also sounds more expensive to me than whatever we could gain from faster decompression.

So, overall, this doesn't seem to be worth the effort, as far as I can tell. But I'm happy to be proven wrong :-)

Thanks!

Hans Kristian

Den 19.11.2020 17:34, skrev mreineck: I'm not absolutely sure, but browsing the code I got the feeling that you are exclusively using Healpix pixel indices in RING scheme. For differencing and Huffman compression it might be quite advantageous to use NESTED scheme instead, since this might lead to a much more concentrated distribution of difference values. Since the (uncompressed) maps themselves will most likely be stored in RING scheme (otherwise there would be problems with the SHTs), this incurs the overhead of doing quite a few nest2ring conversions whenever uncompressing a scan, but that might be an acceptable price to pay. I'm sorry if you have already tried this and it didn't produce any appreciable gain. If so, please feel free to close this!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Cosmoglobe/Commander/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUFTO5RZHEEFF3HHQYUW4LSQVCIRANCNFSM4T3UFZOA.

mreineck commented 4 years ago

And convert_ring2nest is a really expensive operation in MPI.

Oh, I was not suggesting that you actually store any maps in NEST scheme at any time! I just suggested to try

converting (theta, phi) pointing information to NEST pixel indices when compressing the detector pointings, and
each time these NEST pixels are decompressed, immediately converting them back to RING scheme (using nest2ring, which is a pretty fast operation)

It should save you memory on compressed pointings, without costing much CPU time during decompression, but I can't really estimate what the impact would be in reality. But trying this is probably very easy for someone familiar with the code. It's just an additional ring2nest and nest2ring at the right locations.

hke commented 4 years ago

Ah, right -- saving memory rather than CPU! That is a good idea, indeed! As I said, I'm happy to be proved wrong. Definitely worth trying out :-)

Thanks!

Hans Kristian

Den 19.11.2020 17:49, skrev mreineck: And convert_ring2nest is a really expensive operation in MPI.

Oh, I was not suggesting that you actually store any maps in NEST scheme at any time! I just suggested to try

converting (theta, phi) pointing information to NEST pixel indices when compressing the detector pointings, and
each time these NEST pixels are decompressed, immediately converting them back to RING scheme (using |nest2ring|, which is a pretty fast operation)

It should save you memory on compressed pointings, without costing much CPU time during decompression, but I can't really estimate what the impact would be in reality. But trying this is probably very easy for someone familiar with the code. It's just an additional |ring2nest| and |nest2ring| at the right locations.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Cosmoglobe/Commander/issues/4#issuecomment-730500849, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUFTOYXO6FYCBAAQUTTSZ3SQVEADANCNFSM4T3UFZOA.

mreineck commented 4 years ago

Healpix C++ actually has a "Peano-Hilbert" scheme, in addition to RING and NESTED, which might save a few more bits ... but this is probably over the top :-)

mreineck commented 3 years ago

I have been thinking about this again after the LiteBIRD discussion yesterday...

In principle there shouldn't be any need to store detector pointings at all, compressed or uncompressed. All that needs to be known is

the satellite orientation at some (pretty low) sampling rate
the relative orientation of each detector with respect to the satellite reference frame (exactly one quaternion per detector)
the times at which you want to know the detector pointings

The rest can be computed on the fly whenever necessary with very good performance (something like 20-50 million pointings per second per core). I have the necessary code available as part of the ducc library; it's also used by litebird-sim.

hke commented 3 years ago

Not sure I understand.. Are you talking about just for computing the beam convolved response, or also for getting the pixel number of a given pointing during mapmaking?

feb. 2021, 15:30 kl. 15:30 skrev mreineck notifications@github.com:
I have been thinking about this again after the LiteBIRD discussion yesterday...

In principle there shouldn't be any need to store detector pointings at all, compressed or uncompressed. All that needs to be known is
- the satellite orientation at some (pretty low) sampling rate
- the relative orientation of each detector with respect to the satellite reference frame (exactly one quaternion per detector)
- the times at which you want to know the detector pointings The rest can be computed on the fly whenever necessary with very good performance (something like 20-50 million pointings per second per core). I have the necessary code available as part of the ducc library; it's also used by litebird-sim.
-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/Cosmoglobe/Commander/issues/4#issuecomment-782110122

mreineck commented 3 years ago

I'm talking generally about how to answer the question: what is the pointing of detector X at time Y? Since the detectors are connected rigidly to the satellite, and the satellite is rotating slowly without any sudden jerks, this question can be answered with just the knowledge of the satellite orientation at, say, 1-second intervals. Everything can be deduced very accuractely and very quickly from this information, using rotation matrices and quaternion interpolation. The usage scenario would be something like "compute all pointings for LFI-28a for pointing period 2376" instead of "uncompress all pointings for LFI-28a for pointing period 2376 from this big blob of compressed data". It could reduce the memory consumption of the code massively. And from the computed pointings, you also get the pixel number for mapmaking very quickly.

hke commented 3 years ago

OK, perhaps I don't really quite understand the specific algorithm, even though I do understand the general idea. I just find it hard to believe that the interpolation+rotation+ang2pix operations can be that fast, compared to a basic loop like this:

do i = 1, ntod if (precommputed) then p = pix(i) else p = interpolate+rotate+ang2pix end rhs(p) = rhs(p) + tod(i) ... end

The memory spent on compressed pointing is also generally a relatively small fraction compared to the actual TOD. But I'm more than happy to be proven wrong, and if you can provide pixel numbers on a comparable time scale as a precomputed look-up table, then nothing is better than that! :-)

Den 19.02.2021 15:44, skrev mreineck: I'm talking generally about how to answer the question: what is the pointing of detector X at time Y? Since the detectors are connected rigidly to the satellite, and the satellite is rotating slowly without any sudden jerks, this question can be answered with just the knowledge of the satellite orientation at, say, 1-second intervals. Everything can be deduced very accuractely and very quickly from this information, using rotation matrices and quaternion interpolation. The usage scenario would be something like "compute all pointings for LFI-28a for pointing period 2376" instead of "uncompress all pointings for LFI-28a for pointing period 2376 from this big blob of compressed data". It could reduce the memory consumption of the code massively. And from the computed pointings, you also get the pixel number for mapmaking very quickly.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Cosmoglobe/Commander/issues/4#issuecomment-782118222, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUFTO64WMO6Z4HFAYEDWXDS7Z2LHANCNFSM4T3UFZOA.

mreineck commented 3 years ago

In your example the line p = pix(i) will certainly be faster than p = interpolate+rotate+ang2pix, but that's for uncompressed pointing. Will p = huffman_decompress(compressed_pix(i)) be comparably fast? Also, apart from absolute speed, the more important question is perhaps how much more time is spent in actually working with the data, as compared to decompression. If decompression currently accounts for, say, 1% of your total run time, it may still be worthwhile to increase that to maybe 3% for big memory savings.

However, if you say that compressed pointings (and we need to include beam orientation here as well, probably) needs much less memory than compressed TOD anyway, then there's probably not much to be gained.

[Edit: Once you work with discretized TOD the proportions will likely change again.]