Closed iharsuvorau closed 1 year ago
ValueError: probabilities are not non-negative
reported by Prosimos (this log failed during the previous benchmarking because the time ran out, this time I use another machine for benchmarking, not HPC)Still running most of the other logs.
Also, this time I'm facing a new issue with most of the runs. Before, I ran experiments via SLURM on HPC. Now, it's a simple powerful Linux machine in UT network, and I run experiments with a Python script executing a Simod's Docker container on a Simod configuration file via Python's subprocess.run
.
I'm having the following issue now:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/usr/src/Simod/src/simod/simulation/prosimos.py", line 252, in _evaluate_logs_using_metrics
value = compute_metric(metric, validation_log, validation_log_ids, simulated_log, simulated_log_ids)
File "/usr/src/Simod/src/simod/metrics/metrics.py", line 31, in compute_metric
result = get_dl(event_log_1, event_log_1_ids, event_log_2, event_log_2_ids)
File "/usr/src/Simod/src/simod/metrics/metrics.py", line 106, in get_dl
return evaluator.measure_distance()
File "/usr/src/Simod/src/simod/metrics/tsd_evaluator.py", line 64, in measure_distance
distance = self._evaluate_seq_distance(self.log_data, self.simulation_data)
File "/usr/src/Simod/src/simod/metrics/tsd_evaluator.py", line 86, in _evaluate_seq_distance
pool = Pool(processes=cpu_count)
File "/usr/local/lib/python3.10/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 212, in __init__
self._repopulate_pool()
File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 326, in _repopulate_pool_static
w.start()
File "/usr/local/lib/python3.10/multiprocessing/process.py", line 118, in start
assert not _current_process._config.get('daemon'), \
AssertionError: daemonic processes are not allowed to have children
"""
With this explanation https://stackoverflow.com/questions/51485212/multiprocessing-gives-assertionerror-daemonic-processes-are-not-allowed-to-have, it seems that because of the multithreaded evaluation which wants to run also multithreaded TSD (DL) metric computation it can't, because a thread cannot start another pool of threads.
There's a solution to this, https://stackoverflow.com/questions/6974695/python-process-pool-non-daemonic. But it still doesn't seem correct for a thread to run another pool of threads. It can actually slow down the metric computation because of a race for resources.
Hi @david-chapela. I'm still running benchmarking, but I've just taken BPIC 2012 to peak into the results.
With timers, graphs look way off. However, average cycle time seems to be closer to the original times.
These are the results after the bug is fixed (we've already reviewed them, so this is just for the history):
We decided:
Event logs: