AutomatedProcessImprovement / Simod

Simod is a tool for automated BPS model discovery
Apache License 2.0
41 stars 7 forks source link

False positive in discovery with model #27

Closed Mcamargo85 closed 2 years ago

Mcamargo85 commented 2 years ago

When I try to discover simulations parameters in the "discover" mode with a .bpmn model that contains one activity less than the ones in the log, I am getting an Index error from the replayer. Note that the error does not occur with the "optimize" mode. In the link I am attaching the modified model and log.

iharsuvorau commented 2 years ago

@Mcamargo85 as I see, Optimizer also fails when in config we have:

gate_management:
    - discovery

I'm looking into this.

iharsuvorau commented 2 years ago

@Mcamargo85, while Orlenys is looking into it, I'm checking logs from the most recent benchmarking. It's good that we print out to console what gate management is used at each step, so I see "discovery" and "equiprobable" are used successfully. So, as I understand, this bug affects only cases where the model is provided. So, most likely, all the previous benchmarks are okay, because we haven't used models. And in case we use models, then Simod just fails. We should be safe here.

Here's one log, you can Ctrl+F for "Mining gateway probabilities with" https://owncloud.ut.ee/owncloud/index.php/s/L7e3HqNWTF7Bang

iharsuvorau commented 2 years ago

@orlenyslp I found one more exception for the replayer related almost to the same place in code but one line below. Here's the Simod's log and I'm speaking about the first exception in this log:

parsing log, completed traces ::   0%|          | 0/1182 [00:00<?, ?it/s]
parsing log, completed traces ::   1%|          | 10/1182 [00:00<00:12, 94.78it/s]
parsing log, completed traces ::   2%|▏         | 21/1182 [00:00<00:11, 100.22it/s]
parsing log, completed traces ::   3%|▎         | 33/1182 [00:00<00:10, 105.99it/s]
parsing log, completed traces ::   4%|▍         | 48/1182 [00:00<00:09, 122.37it/s]
parsing log, completed traces ::   5%|▌         | 65/1182 [00:00<00:08, 136.22it/s]
parsing log, completed traces ::   7%|▋         | 79/1182 [00:00<00:08, 136.03it/s]
parsing log, completed traces ::   8%|▊         | 99/1182 [00:00<00:06, 155.43it/s]
parsing log, completed traces ::  10%|▉         | 115/1182 [00:00<00:09, 117.97it/s]
parsing log, completed traces ::  11%|█         | 131/1182 [00:01<00:08, 127.33it/s]
parsing log, completed traces ::  12%|█▏        | 147/1182 [00:01<00:07, 134.99it/s]
parsing log, completed traces ::  14%|█▍        | 167/1182 [00:01<00:06, 151.81it/s]
parsing log, completed traces ::  16%|█▌        | 187/1182 [00:01<00:06, 161.17it/s]
parsing log, completed traces ::  18%|█▊        | 207/1182 [00:01<00:05, 171.63it/s]
parsing log, completed traces ::  20%|█▉        | 236/1182 [00:01<00:04, 202.96it/s]
parsing log, completed traces ::  22%|██▏       | 263/1182 [00:01<00:04, 220.17it/s]
parsing log, completed traces ::  24%|██▍       | 286/1182 [00:01<00:04, 217.85it/s]
parsing log, completed traces ::  27%|██▋       | 317/1182 [00:01<00:03, 242.32it/s]
parsing log, completed traces ::  30%|██▉       | 349/1182 [00:01<00:03, 264.72it/s]
parsing log, completed traces ::  32%|███▏      | 380/1182 [00:02<00:02, 271.39it/s]
parsing log, completed traces ::  35%|███▍      | 410/1182 [00:02<00:02, 274.24it/s]
parsing log, completed traces ::  37%|███▋      | 443/1182 [00:02<00:02, 288.44it/s]
parsing log, completed traces ::  40%|████      | 477/1182 [00:02<00:02, 302.43it/s]
parsing log, completed traces ::  43%|████▎     | 508/1182 [00:02<00:02, 300.60it/s]
parsing log, completed traces ::  46%|████▌     | 539/1182 [00:02<00:02, 239.24it/s]
parsing log, completed traces ::  48%|████▊     | 572/1182 [00:02<00:02, 258.88it/s]
parsing log, completed traces ::  51%|█████▏    | 606/1182 [00:02<00:02, 278.01it/s]
parsing log, completed traces ::  54%|█████▍    | 637/1182 [00:02<00:01, 286.20it/s]
parsing log, completed traces ::  57%|█████▋    | 674/1182 [00:03<00:01, 308.53it/s]
parsing log, completed traces ::  60%|█████▉    | 706/1182 [00:03<00:01, 304.60it/s]
parsing log, completed traces ::  63%|██████▎   | 747/1182 [00:03<00:01, 333.13it/s]
parsing log, completed traces ::  66%|██████▌   | 781/1182 [00:03<00:01, 312.49it/s]
parsing log, completed traces ::  69%|██████▉   | 813/1182 [00:03<00:01, 308.58it/s]
parsing log, completed traces ::  72%|███████▏  | 851/1182 [00:03<00:01, 326.63it/s]
parsing log, completed traces ::  75%|███████▌  | 888/1182 [00:03<00:00, 331.57it/s]
parsing log, completed traces ::  78%|███████▊  | 923/1182 [00:03<00:00, 336.51it/s]
parsing log, completed traces ::  81%|████████  | 959/1182 [00:03<00:00, 341.55it/s]
parsing log, completed traces ::  85%|████████▌ | 1006/1182 [00:04<00:00, 378.87it/s]
parsing log, completed traces ::  89%|████████▉ | 1051/1182 [00:04<00:00, 397.74it/s]
parsing log, completed traces ::  93%|█████████▎| 1094/1182 [00:04<00:00, 405.92it/s]
parsing log, completed traces ::  97%|█████████▋| 1148/1182 [00:04<00:00, 443.37it/s]
parsing log, completed traces :: 100%|██████████| 1182/1182 [00:04<00:00, 267.54it/s][DONE]

● Log path: /gpfs/space/home/suvorau/simod/simod/inputs/insurance.xes

➜ Removing outliers from the training partition

Model Discovery and Parameters Extraction
=========================================

mining calendars::   0%|          | 0/2 [00:00<?, ?it/s]
mining calendars:: 100%|██████████| 2/2 [00:00<00:00, 2793.41it/s]
Trial
-----
Train split: 605, validation split: 151

exporting log, completed traces ::   0%|          | 0/605 [00:00<?, ?it/s]
exporting log, completed traces ::   2%|▏         | 10/605 [00:00<00:06, 95.62it/s]
exporting log, completed traces ::   3%|▎         | 21/605 [00:00<00:05, 102.63it/s]
exporting log, completed traces ::   5%|▌         | 32/605 [00:00<00:05, 100.45it/s]
exporting log, completed traces ::   7%|▋         | 43/605 [00:00<00:05, 98.53it/s] 
exporting log, completed traces ::   9%|▉         | 53/605 [00:00<00:05, 95.15it/s]
exporting log, completed traces ::  11%|█         | 64/605 [00:00<00:05, 98.48it/s]
exporting log, completed traces ::  13%|█▎        | 76/605 [00:00<00:05, 104.24it/s]
exporting log, completed traces ::  14%|█▍        | 87/605 [00:00<00:04, 105.37it/s]
exporting log, completed traces ::  16%|█▌        | 98/605 [00:00<00:04, 103.35it/s]
exporting log, completed traces ::  18%|█▊        | 110/605 [00:01<00:04, 107.32it/s]
exporting log, completed traces ::  21%|██        | 125/605 [00:01<00:04, 119.06it/s]
exporting log, completed traces ::  23%|██▎       | 137/605 [00:01<00:04, 114.77it/s]
exporting log, completed traces ::  25%|██▍       | 150/605 [00:01<00:03, 116.51it/s]
exporting log, completed traces ::  27%|██▋       | 162/605 [00:01<00:04, 109.81it/s]
exporting log, completed traces ::  29%|██▉       | 174/605 [00:01<00:04, 106.45it/s]
exporting log, completed traces ::  31%|███       | 185/605 [00:01<00:04, 98.59it/s] 
exporting log, completed traces ::  33%|███▎      | 199/605 [00:01<00:03, 108.12it/s]
exporting log, completed traces ::  35%|███▍      | 211/605 [00:01<00:03, 111.13it/s]
exporting log, completed traces ::  37%|███▋      | 224/605 [00:02<00:03, 116.32it/s]
exporting log, completed traces ::  39%|███▉      | 237/605 [00:02<00:03, 113.06it/s]
exporting log, completed traces ::  42%|████▏     | 252/605 [00:02<00:02, 121.67it/s]
exporting log, completed traces ::  44%|████▍     | 265/605 [00:02<00:02, 122.26it/s]
exporting log, completed traces ::  46%|████▌     | 278/605 [00:02<00:02, 113.16it/s]
exporting log, completed traces ::  48%|████▊     | 290/605 [00:02<00:02, 107.42it/s]
exporting log, completed traces ::  50%|████▉     | 302/605 [00:02<00:02, 105.81it/s]
exporting log, completed traces ::  52%|█████▏    | 314/605 [00:02<00:02, 105.87it/s]
exporting log, completed traces ::  54%|█████▍    | 327/605 [00:03<00:02, 110.27it/s]
exporting log, completed traces ::  56%|█████▌    | 339/605 [00:03<00:02, 112.83it/s]
exporting log, completed traces ::  58%|█████▊    | 352/605 [00:03<00:02, 114.04it/s]
exporting log, completed traces ::  61%|██████    | 367/605 [00:03<00:01, 123.23it/s]
exporting log, completed traces ::  64%|██████▍   | 388/605 [00:03<00:01, 146.06it/s]
exporting log, completed traces ::  67%|██████▋   | 405/605 [00:03<00:01, 150.83it/s]
exporting log, completed traces ::  70%|██████▉   | 421/605 [00:03<00:01, 152.10it/s]
exporting log, completed traces ::  72%|███████▏  | 438/605 [00:03<00:01, 155.44it/s]
exporting log, completed traces ::  75%|███████▌  | 454/605 [00:03<00:01, 129.30it/s]
exporting log, completed traces ::  77%|███████▋  | 468/605 [00:04<00:01, 130.64it/s]
exporting log, completed traces ::  80%|███████▉  | 482/605 [00:04<00:00, 123.96it/s]
exporting log, completed traces ::  82%|████████▏ | 497/605 [00:04<00:00, 128.98it/s]
exporting log, completed traces ::  84%|████████▍ | 511/605 [00:04<00:00, 130.13it/s]
exporting log, completed traces ::  87%|████████▋ | 525/605 [00:04<00:00, 124.41it/s]
exporting log, completed traces ::  89%|████████▉ | 538/605 [00:04<00:00, 121.94it/s]
exporting log, completed traces ::  91%|█████████ | 552/605 [00:04<00:00, 125.35it/s]
exporting log, completed traces ::  94%|█████████▎| 566/605 [00:04<00:00, 128.65it/s]
exporting log, completed traces ::  96%|█████████▌| 582/605 [00:04<00:00, 136.76it/s]
exporting log, completed traces ::  99%|█████████▊| 596/605 [00:05<00:00, 133.63it/s]
exporting log, completed traces :: 100%|██████████| 605/605 [00:05<00:00, 119.51it/s]
Mining the model structure
--------------------------

▶︎ Log file /gpfs/space/home/suvorau/simod/simod/outputs/20211125_966B6132_B04B_44CB_B0D2_00A8C03DC0D3/20211125_8702B343_6889_409C_A97E_8C2B4F7725FD/insurance.xes
TESTCODE - SM2
LOGP - total events parsed: 27948
LOGP - start events parsed: 13974
LOGP - complete events parsed: 13974
LOGP - total distinct events: 11
LOGP - total distinct traces: 582
DEBUG - generating complex log
DFGP - settings (eta, epsilon, filter-type) > 1.0 : 0.2975190805171588 : FWG
DEBUG - potential parallelisms: 2
DFGP - loops length TWO found: 0
DEBUG - removed parallelism edges: 2
DEBUG - removing empty parallel flow.
DEBUG - removing empty parallel flow.
WARNING - impossible fix soundness.
Fontconfig warning: ignoring UTF-8: not a valid region tag

Loading of bpmn structure from file 0.0%...      
Loading of bpmn structure from file 6.666%...      
Loading of bpmn structure from file 13.33%...      
Loading of bpmn structure from file 20.0%...      
Loading of bpmn structure from file 26.66%...      
Loading of bpmn structure from file 33.33%...      
Loading of bpmn structure from file 40.0%...      
Loading of bpmn structure from file 46.66%...      
Loading of bpmn structure from file 53.33%...      
Loading of bpmn structure from file 60.0%...      
Loading of bpmn structure from file 66.66%...      
Loading of bpmn structure from file 73.33%...      
Loading of bpmn structure from file 80.0%...      
Loading of bpmn structure from file 86.66%...      
Loading of bpmn structure from file 93.33%...      
Loading of bpmn structure from file 100.0%...      [DONE]

➜ Resource Miner

mining calendars::   0%|          | 0/2 [00:00<?, ?it/s]
mining calendars:: 100%|██████████| 2/2 [00:00<00:00, 4156.89it/s]
➜ Inter-arrival Miner

extracting interarrivals::   0%|          | 0/126 [00:00<?, ?it/s]
extracting interarrivals:: 100%|██████████| 126/126 [00:00<00:00, 2997.78it/s]
➜ Mining gateway probabilities with discovery
'node_b2600776-5be8-4947-80c6-31f6ba319a9f'

Traceback (most recent call last):
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/decorators.py", line 78, in safety_check
    response['values'] = method(*args, **kw)
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/structure_optimizer.py", line 172, in _extract_parameters
    settings=settings, process_graph=process_graph, log=self.log_train, model_path=model_path)
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/common_routines.py", line 60, in extract_structure_parameters
    traces, bpmn_graph, settings.gate_management)
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/common_routines.py", line 156, in mine_gateway_probabilities_alternative_with_gateway_management
    arcs_frequencies = compute_sequence_flow_frequencies(log_traces, bpmn_graph)
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/common_routines.py", line 112, in compute_sequence_flow_frequencies
    bpmn_graph.replay_trace(task_sequence, flow_arcs_frequency)
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/replayer_datatypes.py", line 401, in replay_trace
    self.postprocess_unfired_tasks(task_sequence, fired_tasks, f_arcs_frequency)
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/replayer_datatypes.py", line 411, in postprocess_unfired_tasks
    fix_from = [self.starting_event, self.closest_distance[e_info.id][self.starting_event]]
KeyError: 'node_b2600776-5be8-4947-80c6-31f6ba319a9f'

simulating::   0%|          | 0/5 [00:00<?, ?it/s]
● Custom simulator has been chosen

● Custom simulator has been chosen

● Number of simulation cases: 151

● Number of simulation cases: 151

● Custom simulator has been chosen

● Number of simulation cases: 151

● Custom simulator has been chosen

● Number of simulation cases: 151

● Custom simulator has been chosen

● Number of simulation cases: 151

simulating:: 100%|██████████| 5/5 [00:01<00:00,  4.57it/s]
simulating:: 100%|██████████| 5/5 [00:01<00:00,  4.57it/s]

reading simulated logs::   0%|          | 0/5 [00:00<?, ?it/s]
reading simulated logs:: 100%|██████████| 5/5 [00:00<00:00, 45.81it/s]
reading simulated logs:: 100%|██████████| 5/5 [00:01<00:00,  4.50it/s]
job exception: [Errno 2] No such file or directory: '/gpfs/space/home/suvorau/simod/simod/outputs/20211125_966B6132_B04B_44CB_B0D2_00A8C03DC0D3/20211125_8702B343_6889_409C_A97E_8C2B4F7725FD/sim_data/insurance_2.csv'
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/storage/software/python/3.7.1/miniconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/storage/software/python/3.7.1/miniconda3/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/common_routines.py", line 241, in read_stats_alt
    return read(*args)
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/common_routines.py", line 232, in read
    verbose=False)
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/readers/log_reader.py", line 322, in __init__
    self.load_data_from_file()
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/readers/log_reader.py", line 344, in load_data_from_file
    self.get_csv_events_data()
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/readers/log_reader.py", line 404, in get_csv_events_data
    log = pd.read_csv(self.input)
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 610, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 462, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 819, in __init__
    self._engine = self._make_engine(self.engine)
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 1050, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 1867, in __init__
    self._open_handles(src, kwds)
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/pandas/io/parsers.py", line 1368, in _open_handles
    storage_options=kwds.get("storage_options", None),
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/pandas/io/common.py", line 652, in get_handle
    newline="",
FileNotFoundError: [Errno 2] No such file or directory: '/gpfs/space/home/suvorau/simod/simod/outputs/20211125_966B6132_B04B_44CB_B0D2_00A8C03DC0D3/20211125_8702B343_6889_409C_A97E_8C2B4F7725FD/sim_data/insurance_2.csv'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/gpfs/space/home/suvorau/simod/venv/bin/simod", line 11, in <module>
    load_entry_point('simod', 'console_scripts', 'simod')()
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/cli.py", line 59, in optimize
    optimizer.execute_pipeline(discover_model=global_config.model_path is None)
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/optimizer.py", line 54, in execute_pipeline
    strctr_optimizer.execute_trials()
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/structure_optimizer.py", line 123, in execute_trials
    show_progressbar=False)
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/hyperopt/fmin.py", line 522, in fmin
    trials_save_file=trials_save_file,
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/hyperopt/base.py", line 699, in fmin
    trials_save_file=trials_save_file,
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/hyperopt/fmin.py", line 553, in fmin
    rval.exhaust()
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/hyperopt/fmin.py", line 356, in exhaust
    self.run(self.max_evals - n_done, block_until_done=self.asynchronous)
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/hyperopt/fmin.py", line 292, in run
    self.serial_evaluate()
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/hyperopt/fmin.py", line 170, in serial_evaluate
    result = self.domain.evaluate(spec, ctrl)
  File "/gpfs/space/home/suvorau/simod/venv/lib/python3.7/site-packages/hyperopt/base.py", line 907, in evaluate
    rval = self.fn(pyll_rval)
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/structure_optimizer.py", line 101, in exec_pipeline
    rsp = simulate(trial_stg, self.log_valdn, self.log_valdn, evaluate_fn=evaluate_logs)
  File "/gpfs/space/home/suvorau/simod/simod/src/simod/simulator.py", line 105, in simulate
    args = [(settings, process_stats, log) for log in p.get()]
  File "/storage/software/python/3.7.1/miniconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: '/gpfs/space/home/suvorau/simod/simod/outputs/20211125_966B6132_B04B_44CB_B0D2_00A8C03DC0D3/20211125_8702B343_6889_409C_A97E_8C2B4F7725FD/sim_data/insurance_2.csv'

Here's the related log "insurance.xes" https://owncloud.ut.ee/owncloud/index.php/s/w7Df7fPXSFsMSGw