dingo-gw / dingo

Dingo: Deep inference for gravitational-wave observations
MIT License
55 stars 18 forks source link

Profile synthetic phase #107

Open max-dax opened 2 years ago

max-dax commented 2 years ago

Why is sampling synthetic phase so much slower than a single likelihood evaluation? Profiling the code with print statements

        print(
            f"Modes: {t_modes:.3f} ({t_modes/t_total * 100:.1f}%)\t\t"
            f"Inner products: {t_inner:.3f} ({t_inner/t_total * 100:.1f}%)\t\t"
            f"Phase loop: {t_phase:.3f} ({t_phase/t_total * 100:.1f}%)"
        )

we get:

Modes: 0.189 (50.2%)        Inner products: 0.002 (0.7%)        Phase loop: 0.184 (49.1%)
Modes: 0.189 (47.1%)        Inner products: 0.003 (0.6%)        Phase loop: 0.209 (52.3%)
Modes: 0.195 (48.0%)        Inner products: 0.002 (0.6%)        Phase loop: 0.209 (51.4%)
Modes: 0.186 (48.1%)        Inner products: 0.002 (0.6%)        Phase loop: 0.198 (51.3%)
Modes: 0.191 (46.9%)        Inner products: 0.002 (0.6%)        Phase loop: 0.213 (52.5%)
Modes: 0.190 (45.1%)        Inner products: 0.003 (0.6%)        Phase loop: 0.229 (54.3%)
Modes: 0.189 (44.6%)        Inner products: 0.007 (1.6%)        Phase loop: 0.229 (53.8%)
Modes: 0.212 (49.5%)        Inner products: 0.002 (0.6%)        Phase loop: 0.214 (49.9%)
Modes: 0.207 (48.4%)        Inner products: 0.002 (0.6%)        Phase loop: 0.218 (51.0%)
Modes: 0.219 (51.3%)        Inner products: 0.002 (0.6%)        Phase loop: 0.206 (48.1%)
Modes: 0.191 (51.5%)        Inner products: 0.002 (0.7%)        Phase loop: 0.177 (47.9%)
Modes: 0.194 (51.9%)        Inner products: 0.002 (0.6%)        Phase loop: 0.177 (47.5%)
Modes: 0.357 (64.9%)        Inner products: 0.003 (0.5%)        Phase loop: 0.190 (34.6%)
Modes: 0.422 (68.9%)        Inner products: 0.002 (0.4%)        Phase loop: 0.188 (30.7%)

The phase loop can likely be vectorized, but is there a way to speed up mode computation? This seems to take much longer than a simple waveform evaluation, although in the lalsimulation backend both should compute the same things.

mpuerrer commented 2 years ago

Are these times in seconds? What is the call path to the LAL function?

stephengreen commented 2 years ago

I'm not sure I understand this table. So we have roughly 50% of synthetic phase time spent on generating modes, < 1% on inner products, but what is the third column, "phase loop"?

How does the mode generation time of ~ 0.2 s compare to the time to generate the polarizations? Is this XPHM?

stephengreen commented 2 years ago

Okay, I see that it takes about 10 times longer to generate XPHM modes with ChooseFDModes vs generating polarizations with SimInspiralFD. Are you sure that the implementation within lalsimulation is really the same?

max-dax commented 2 years ago
  1. Yes, it is in seconds, see code snippet that generated it
  2. phase loop refers to the loop over the phase grid, which could possibly be vectorized (see comment in PR #100 in which I linked this issue). Also, setting n_grid=10 (i.e., largely removing the effect of the phase loop) only reduces the time by ~30%, not by 50%. Maybe this is because some waveforms take a bit longer to generate, I saw some with up to 0.7 s.
  3. I have not looked into the lalsimulation backend, and it is possible there is a reason it takes longer to compute individual modes than to compute the entire waveform. But I guess that computing individual modes would be an intermediate step for computing the waveform, that's why I am surprised it takes so much longer. Note that the timing also includes the frame transformations that you apply, although I would expect these to be rather lightweight.
stephengreen commented 2 years ago

Yeah I tested it without even the frame transformations, and I found that the factor of 10 difference persists. It is possible that the lalsimulation implementation is rather different for the two cases. I could ask on mattermost.

stephengreen commented 2 years ago

Okay, this is the expected behavior. Cecilio said

The implementations are independent, in fact ChooseFDModes came after the implementation of IMRPhenomXPHM in ChooseFDWaveform. The implementation could in fact be improved to be more efficient, but for now what it does is that for each mode in ChooseFDModes, it has to compute the corresponding modes in the co-precessing frame and then do the rotation to the J-frame. In ChooseFDWaveform the co-precessing modes are only computed once.

I think it's fine to leave it as-is for now, and maybe the waveform people will make it faster at some point.

max-dax commented 2 years ago

Ok thanks for checking this! I guess in the paper we can list the IS computation time for IMRPhenomXPHM, under real and under optimal (i.e., if lalsimulation was optimized) conditions.

stephengreen commented 2 years ago

Yeah, or just have a footnote about this.

mpuerrer commented 2 years ago

Could it be that for the polarization API they're only computing the waveform on a sparse grid and then interpolating the data with a multi-banding approach a la https://inspirehep.net/literature/1516394 ?

stephengreen commented 2 years ago

It turned out that we had to turn multi-banding off to generate accurate modes. When I tested turning it back on, it improved the runtime by maybe 20%, so this is definitely not the whole story.

My understanding is that ChooseFDModes must be computing each mode separately in a way that is not efficient, whereas ChooseFDWaveform must be reusing calculations that are the same for each mode. Hence ChooseFDModes takes ten times as long.

stephengreen commented 2 months ago

Is this still important? We could go back to the Phenom developers to see if they can accelerate ChooseFDModes.