Large `nact` number of autocorrelation lengths causes division-by-zero error with dynesty `act-walk` sampling method #726

Closed bilby-bot closed 2 weeks ago

In GitLab by @git.ligo:noah.wolfe on Feb 10, 2023, 20:07

When running PE on a BBH injection with dynesty, using the new act-walk sampling method, if the number of autocorrelation lengths nact is large, probably > 10, (in the example below, I used nact = 30), eventually a division-by-zero error gets thrown. This occurs because we thin the MCMC chain used to propose the next dynesty step by nact, but these chains will closer to ~a few autocorrelation lengths with the improvements to the computation of the autocorrelation length with the act-walk method. (Thanks for the help in understanding this issue @git.ligo:colm.talbot, please feel free to edit my remarks here.)

Code version: 15cb27e0 on dynesty-differential branch

Log snippet with dynesty settings:

22:48 bilby INFO    : Using sampler Dynesty with kwargs {'nlive': 100, 'bound': 'bilby', 'sample': 'act-walk', 'periodic': None, 'reflective': None, 'update_interval': 600, 'first_update': None, 'npdim': None, 'rstate': None, 'queue_size': 1, 'pool': None, 'use_pool': None, 'live_points': None, 'logl_args': None, 'logl_kwargs': None, 'ptform_args': None, 'ptform_kwargs': None, 'gradient': None, 'grad_args': None, 'grad_kwargs': None, 'compute_jac': False, 'enlarge': None, 'bootstrap': None, 'walks': 100, 'facc': 0.2, 'slices': None, 'fmove': 0.9, 'max_move': 100, 'update_func': None, 'ncdim': None, 'blob': False, 'save_history': False, 'history_filename': None, 'maxiter': None, 'maxcall': None, 'dlogz': 0.1, 'logl_max': inf, 'n_effective': None, 'add_live': True, 'print_progress': True, 'print_func': <bound method Dynesty._print_func of <bilby.core.sampler.dynesty.Dynesty object at 0x14b229430>>, 'save_bounds': False, 'checkpoint_file': None, 'checkpoint_every': 60, 'resume': False}
22:48 bilby INFO    : Checkpoint every check_point_delta_t = 600s
22:48 bilby INFO    : Using dynesty version 2.1.0
22:48 bilby INFO    : Using the bilby-implemented rwalk sampling tracking the autocorrelation function and thinning by 30 with maximum length 150000

Error trace:

Traceback (most recent call last):
  File "/Users/newolfe/Documents/projects/astro/ligo/ssm/sub-solar-mass-measurability/scripts/ssm-relbin-diffdynesty.py", line 87, in <module>
    result = bilby.run_sampler(
  File "/Users/newolfe/opt/anaconda3/envs/my_igwn-testing-local/lib/python3.9/site-packages/bilby/core/sampler/__init__.py", line 234, in run_sampler
    result = sampler.run_sampler()
  File "/Users/newolfe/opt/anaconda3/envs/my_igwn-testing-local/lib/python3.9/site-packages/bilby/core/sampler/base_sampler.py", line 96, in wrapped
    output = method(self, *args, **kwargs)
  File "/Users/newolfe/opt/anaconda3/envs/my_igwn-testing-local/lib/python3.9/site-packages/bilby/core/sampler/dynesty.py", line 483, in run_sampler
    out = self._run_external_sampler_with_checkpointing()
  File "/Users/newolfe/opt/anaconda3/envs/my_igwn-testing-local/lib/python3.9/site-packages/bilby/core/sampler/dynesty.py", line 578, in _run_external_sampler_with_checkpointing
    self.sampler.run_nested(**sampler_kwargs)
  File "/Users/newolfe/opt/anaconda3/envs/my_igwn-testing-local/lib/python3.9/site-packages/dynesty/sampler.py", line 1016, in run_nested
    for it, results in enumerate(
  File "/Users/newolfe/opt/anaconda3/envs/my_igwn-testing-local/lib/python3.9/site-packages/dynesty/sampler.py", line 838, in sample
    u, v, logl, nc = self._new_point(loglstar_new)
  File "/Users/newolfe/opt/anaconda3/envs/my_igwn-testing-local/lib/python3.9/site-packages/dynesty/sampler.py", line 395, in _new_point
    u, v, logl, nc, blob = self._get_point_value(loglstar)
  File "/Users/newolfe/opt/anaconda3/envs/my_igwn-testing-local/lib/python3.9/site-packages/dynesty/sampler.py", line 379, in _get_point_value
    self._fill_queue(loglstar)
  File "/Users/newolfe/opt/anaconda3/envs/my_igwn-testing-local/lib/python3.9/site-packages/dynesty/sampler.py", line 372, in _fill_queue
    self.queue = list(mapper(evolve_point, args))
  File "/Users/newolfe/opt/anaconda3/envs/my_igwn-testing-local/lib/python3.9/site-packages/bilby/core/sampler/dynesty_utils.py", line 178, in __call__
    self.build_cache()
  File "/Users/newolfe/opt/anaconda3/envs/my_igwn-testing-local/lib/python3.9/site-packages/bilby/core/sampler/dynesty_utils.py", line 310, in build_cache
    accept = max(accept // n_found, 1)
ZeroDivisionError: integer division or modulo by zero

In GitLab by @git.ligo:alexandresebastien.goettel on Feb 23, 2024, 11:20

Hi @git.ligo:colm.talbot, @git.ligo:michael.williams and I have been looking at this issue and the problem is that when the number of points (u_list) is too short when compared to the autocorrelation length, the number "n_found" of points after thinning will be zero after thinning, leading to ZeroDivisionErrors down the line. This is in principle independent of 'nact', though of course it happens more quickly when 'nact' is large.

It seems like an easy fix if we add a clause to just return the current points in that case (the same that happens when the estimated act is infinite), but given that there are several ways to go about this that can depend on the details of the implementation, we thought that talking about it with you would be the best way to go forward, what do you think?

Thanks!

In GitLab by @git.ligo:michael.williams on Oct 3, 2024, 17:55

unassigned @git.ligo:alexandresebastien.goettel

bilby-dev / bilby

Large `nact` number of autocorrelation lengths causes division-by-zero error with dynesty `act-walk` sampling method #726