Closed segasai closed 3 years ago
These were added because interrupts can also occur while fortran code is executed.
I've just tested a case with 10000 live points and 1000 dimensions to force spending a lot of time inside fortran code and tried to interrupt it and I'm still getting the same result as before:
Exception ignored on calling ctypes callback function: <function run.<locals>.loglike at 0x7f295fe29d30>
Traceback (most recent call last):
File "/home/skoposov/soft/PyMultiNest/pymultinest/run.py", line 257, in loglike
def loglike(cube, ndim, nparams, lnew, nullcontext):
File "/home/skoposov/soft/PyMultiNest/pymultinest/run.py", line 97, in interrupt_handler
sys.exit(1)
SystemExit: 1
And again nothing actually is interrupted...
try MPI parallelisation instead of dask
I guess dask does pymultinest.run in threads/processes and your main program remains the master program. dask then would send wrapped exceptions back from the threads to the main program. But python exceptions while in the callback python functions cannot escalate their exception up through fortran.
try MPI parallelisation instead of dask
Thank you, I am aware of MPI but it is simply the wrong tool when trying to do a very large number of independent inference problems.
I typically launch a new process for each data set, with make or xargs. On clusters you can use the batch system.
Perhaps a workaround is to exec a new process in each dask thread?
If you need a python solution, ultranest should be able to handle this better.
I guess dask does pymultinest.run in threads/processes and your main program remains the master program. dask then would send wrapped exceptions back from the threads to the main program. But python exceptions while in the callback python functions cannot escalate their exception up through fortran.
Dask is simply a launcher for large number of heterogeneous jobs. But the " exception wrapping" is not relevant here.
Right now Pymultinest will simply refuse to run inside a thread in any environment (dask or not), because setting of a signal handler is not allowed inside a thread.
Since it seems to me that the handler is a complete no-op (at least on the tests/systems I checked) I think it can be removed. But it is possible that I missing something.
If you are not convinced/ don't want to change anything, that is fine by me, I will just use a locally modified version.
Ah, I see what you mean now. There must have been a change in ctypes's behaviour.
PRs are always welcome.
It would be good if we can stop the process somehow.
Here is a potential solution by changing the interrupt_handler:
def interrupt_handler(recvsignal, frame):
sys.stderr.write('ERROR: Interrupt received: Terminating\n')
os.kill(os.getpid(), signal.SIGTERM)
What does the error look like for setting signal handlers in dask threads?
Is there any reason you want threads instead of processes with dask (it supports several backends afaik)?
Thanks for taking a look at the root of the issue.
from __future__ import absolute_import, unicode_literals, print_function
import numpy
from numpy import pi, cos
from pymultinest.solve import solve
import os
import time
from dask.distributed import Client
try:
os.mkdir('chains')
except OSError:
pass
def myprior(cube): return cube 10 pi
def myloglike(cube):
chi = (cos(cube * 2.) + 1).sum()
return (2. + chi)
def func(a):
parameters = ["x", "y"]
n_params = 10000 # len(parameters)
# name of the output files
prefix = "chains/3-"
# run MultiNest
result = solve(LogLikelihood=myloglike,
Prior=myprior,
n_live_points=100000,
n_dims=n_params,
outputfiles_basename=prefix,
verbose=True)
if name == 'main': C=Client() for i in range(10): C.submit(func,i)
and this simple configuration
dask-scheduler --scheduler-file=sched.json & dask-worker --scheduler-file=sched.json &
Seems like Client is implicitly creating a LocalCluster with, by default processes=False. Try processes=True? https://docs.dask.org/en/latest/setup/single-distributed.html
Not in my case. I'm providing a scheduler info. In fact in practice I'm starting dask-worker's on tens of nodes on a cluster. And even if I'm starting them with
$ dask-worker --scheduler-file=sched.json --nthreads=1
they are still inside a thread. And the reason is that dask-worker uses futures.ThreadPoolExecutor https://github.com/dask/distributed/blob/a0d60e7b2ad16304128c7989a111539a4d80ed38/distributed/worker.py#L631
One option to fix my issue that seems to work is to put signal.signal() inside
if threading.active_count()==1:
(that still keeps signal.signal as no-op though)
Would putting C.shutdown() into the signal handler work?
I don't think it solves anything. IMO there are multiple issues/questions here that potentially require solving.
I pushed what I mentioned in https://github.com/JohannesBuchner/PyMultiNest/issues/178#issuecomment-773948089 as a commit, which addresses the second point.
I've created a PR #179 to not use signal handlers in threaded environments
thanks for merging the PR!
Hi,
While trying to run multiple instances pymultinest with dask, I noticed that because dask starts worker processes in threads, it is unfortunately incompatible with signal handling done by pymultinest (leading to the "ValueError: signal only works in main thread" error). I then looked into whether signal.signal() signal handlers in pymultinest can be safely commented out.
And I think at this point they don't actually do anything. I.e. If I add time.sleep(5) to the pymultinest_demo.py likelihood to emulate long-running likelihood, then the Ctrl-C interruption will bring the following exception
but it will NOT interrupt the processing. If I comment out the signal.signal() completely I basically get the same behaviour.
So it seems to me that the signal handler is not useful. I don't quite know if it is possible to still make the whole thing interruptable https://stackoverflow.com/questions/14707049/allowing-ctrl-c-to-interrupt-a-python-c-extension -- this didn't illuminate me.