AutomatedProcessImprovement / Simod

Simod is a tool for automated BPS model discovery
Apache License 2.0
40 stars 7 forks source link

Update SplitMiner version to avoid "IllegalArgumentException: Comparison method violates its general contract" exception #121

Closed LeonBein closed 1 year ago

LeonBein commented 1 year ago

When running Simod on the BPI challenge 2015 Municipality 1 log the run fails with java.lang.IllegalArgumentException: Comparison method violates its general contract in one of the SplitMiner plugins.

Here is my Simod Config:

version: 2
common:
  log_path: resources/BPIC15_1.xes
  repetitions: 1
  evaluation_metrics:
    - dl
    - absolute_hourly_emd
    - cycle_time_emd
    - circadian_emd
preprocessing:
  multitasking: false
structure:
  max_evaluations: 1
  mining_algorithm: sm3
  optimization_metric: dl
  concurrency:
    - 0.0
    - 1.0
  epsilon:
    - 0.0
    - 1.0
  eta:
    - 0.0
    - 1.0
  gateway_probabilities:
    - equiprobable
    - discovery
  replace_or_joins:
    - true
    - false
  prioritize_parallelism:
    - true
    - false
calendars:
  max_evaluations: 1
  optimization_metric: absolute_hourly_emd
  resource_profiles:
    discovery_type: pool
    granularity: 60
    confidence: 0.1
    support: 0.7
    participation: 0.4
extraneous_activity_delays:
  num_iterations: 1
  optimization_metric: relative_emd

Here is the full console log:

➜ Executing shell command: ['/root/.cache/pypoetry/virtualenvs/simod-CiydOZFs-py3.9/bin/pm4py_wrapper', '-i', '/usr/src/Simod/resources/BPIC15_1.xes', '-o', '/usr/src/Simod/resources', 'xes-to-csv']
/usr/src/Simod/src/simod/event_log/utilities.py:62: DtypeWarning: Columns (29) have mixed types. Specify dtype option on import or set low_memory=False.
  log = pd.read_csv(log_path_csv)

Pre-processing
==============

➜ Adding enabled times

Structure optimization
======================

➜ Executing shell command: ['/root/.cache/pypoetry/virtualenvs/simod-CiydOZFs-py3.9/bin/pm4py_wrapper', '-i', '/usr/src/Simod/outputs/structure_20230622_100919_59D573F7_10FA_4CF8_99EF_CD90E9D7A943/BPIC15_1.xes', '-o', '/usr/src/Simod/outputs/structure_20230622_100919_59D573F7_10FA_4CF8_99EF_CD90E9D7A943', 'csv-to-xes']
exporting log, completed traces :: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 796/796 [00:04<00:00, 173.31it/s]

Structure Optimization Trial
----------------------------
Parameters: PipelineSettings(output_dir=None, model_path=None, project_name='BPIC15_1', gateway_probabilities_method=<GatewayProbabilitiesDiscoveryMethod.EQUIPROBABLE: 'equiprobable'>, epsilon=0.5296531367602891, eta=0.028561777123138787, concurrency=0.0, prioritize_parallelism='true', replace_or_joins='false')

➜ Executing SplitMiner

➜ SplitMiner3 is running with the following arguments: ['java', '-Xmx2G', '-Xms1024M', '-cp', '/usr/src/Simod/external_tools/splitminer3/bpmtk.jar:/usr/src/Simod/external_tools/splitminer3/lib/*', 'au.edu.unimelb.services.ServiceProvider', 'SMD', '0.028561777123138787', '0.5296531367602891', 'true', 'false', 'false', '/usr/src/Simod/outputs/structure_20230622_100919_59D573F7_10FA_4CF8_99EF_CD90E9D7A943/BPIC15_1.xes', '/usr/src/Simod/outputs/structure_20230622_100919_59D573F7_10FA_4CF8_99EF_CD90E9D7A943/structure_trial_20230622_100933_84B5C62B_E09F_4223_B32A_A7E6439E0F35/BPIC15_1']
TESTCODE - SMD
LOGP - total events parsed: 67020
LOGP - start events parsed: 33510
LOGP - complete events parsed: 33510
LOGP - total distinct events: 370
LOGP - total distinct traces: 774
DEBUG - generating complex log
DFGP - settings (eta, epsilon, filter-type) > 0.028561777123138787 : 0.5296531367602891 : FWG
DEBUG - potential parallelisms: 0
DFGP - loops length TWO found: 4
DEBUG - removed parallelism edges: 0
eTIME - 29.077s
ERROR: wrong usage.
RUN> java -cp bpmtk.jar;lib\* au.edu.unimelb.services.ServiceProvider SMD e n p 'logpath\log.[xes|xes.gz|mxml]' 'outputpath\outputname'
PARAM: e = double in [0,1] : parallelism threshold (epsilon)
PARAM: n = double in [0,1] : percentile for frequency threshold (eta)
PARAM: p = [true|false] : replace non trivial OR joins?
EXAMPLE: java -cp bpmtk.jar;lib\* au.edu.unimelb.services.ServiceProvider SMD 0.1 0.4 .\logs\SEPSIS.xes.gz .\outputs\SEPSIS
java.lang.IllegalArgumentException: Comparison method violates its general contract!
        at java.util.ComparableTimSort.mergeLo(ComparableTimSort.java:744)
        at java.util.ComparableTimSort.mergeAt(ComparableTimSort.java:481)
        at java.util.ComparableTimSort.mergeCollapse(ComparableTimSort.java:406)
        at java.util.ComparableTimSort.sort(ComparableTimSort.java:213)
        at java.util.Arrays.sort(Arrays.java:1246)
        at com.jgraph.layout.hierarchical.JGraphMedianHybridCrossingReduction.medianRank(JGraphMedianHybridCrossingReduction.java:456)
        at com.jgraph.layout.hierarchical.JGraphMedianHybridCrossingReduction.weightedMedian(JGraphMedianHybridCrossingReduction.java:400)
        at com.jgraph.layout.hierarchical.JGraphMedianHybridCrossingReduction.run(JGraphMedianHybridCrossingReduction.java:92)
        at com.jgraph.layout.hierarchical.JGraphHierarchicalLayout.run(JGraphHierarchicalLayout.java:423)
        at com.jgraph.layout.JGraphFacade.run(JGraphFacade.java:470)
        at org.processmining.models.jgraph.ProMJGraphVisualizer.visualizeGraph(ProMJGraphVisualizer.java:117)
        at org.processmining.models.jgraph.ProMJGraphVisualizer.visualizeGraph(ProMJGraphVisualizer.java:72)
        at org.processmining.plugins.bpmn.BpmnDefinitions$BpmnDefinitionsBuilder.fillGraphicsInfo(BpmnDefinitions.java:179)
        at org.processmining.plugins.bpmn.BpmnDefinitions$BpmnDefinitionsBuilder.buildFromDiagram(BpmnDefinitions.java:161)
        at org.processmining.plugins.bpmn.BpmnDefinitions$BpmnDefinitionsBuilder.<init>(BpmnDefinitions.java:81)
        at org.processmining.plugins.bpmn.plugins.BpmnExportPlugin.retrieveContent(BpmnExportPlugin.java:180)
        at org.processmining.plugins.bpmn.plugins.BpmnExportPlugin.export(BpmnExportPlugin.java:78)
        at au.edu.unimelb.services.ServiceProvider.SplitMinerService(ServiceProvider.java:532)
        at au.edu.unimelb.services.ServiceProvider.main(ServiceProvider.java:96)
Model file /usr/src/Simod/outputs/structure_20230622_100919_59D573F7_10FA_4CF8_99EF_CD90E9D7A943/structure_trial_20230622_100933_84B5C62B_E09F_4223_B32A_A7E6439E0F35/BPIC15_1.bpmn hasn't been mined
Traceback (most recent call last):
  File "/usr/src/Simod/src/simod/hyperopt_pipeline.py", line 12, in step
    return STATUS_OK, fn(*args)
  File "/usr/src/Simod/src/simod/process_structure/optimizer.py", line 263, in _mine_structure
    StructureMiner(
  File "/usr/src/Simod/src/simod/process_structure/miner.py", line 172, in run
    assert self.output_model_path.exists(), f"Model file {self.output_model_path} hasn't been mined"
AssertionError: Model file /usr/src/Simod/outputs/structure_20230622_100919_59D573F7_10FA_4CF8_99EF_CD90E9D7A943/structure_trial_20230622_100933_84B5C62B_E09F_4223_B32A_A7E6439E0F35/BPIC15_1.bpmn hasn't been mined
Mining failed: error reading file '/usr/src/simod/outputs/structure_20230622_100919_59d573f7_10fa_4cf8_99ef_cd90e9d7a943/structure_trial_20230622_100933_84b5c62b_e09f_4223_b32a_a7e6439e0f35/bpic15_1.bpmn': failed to load external entity "/usr/src/simod/outputs/structure_20230622_100919_59d573f7_10fa_4cf8_99ef_cd90e9d7a943/structure_trial_20230622_100933_84b5c62b_e09f_4223_b32a_a7e6439e0f35/bpic15_1.bpmn"
/root/.cache/pypoetry/virtualenvs/simod-CiydOZFs-py3.9/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3464: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/root/.cache/pypoetry/virtualenvs/simod-CiydOZFs-py3.9/lib/python3.9/site-packages/numpy/core/_methods.py:192: RuntimeWarning: invalid value encountered in scalar divide
  ret = ret.dtype.type(ret / rcount)
StructureOptimizer pipeline response: {'loss': nan, 'status': 'fail', 'output_dir': PosixPath('/usr/src/Simod/outputs/structure_20230622_100919_59D573F7_10FA_4CF8_99EF_CD90E9D7A943/structure_trial_20230622_100933_84B5C62B_E09F_4223_B32A_A7E6439E0F35'), 'model_path': PosixPath('/usr/src/Simod/outputs/structure_20230622_100919_59D573F7_10FA_4CF8_99EF_CD90E9D7A943/structure_trial_20230622_100933_84B5C62B_E09F_4223_B32A_A7E6439E0F35/BPIC15_1.bpmn')}
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/root/.cache/pypoetry/virtualenvs/simod-CiydOZFs-py3.9/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/simod-CiydOZFs-py3.9/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/root/.cache/pypoetry/virtualenvs/simod-CiydOZFs-py3.9/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/root/.cache/pypoetry/virtualenvs/simod-CiydOZFs-py3.9/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/root/.cache/pypoetry/virtualenvs/simod-CiydOZFs-py3.9/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/src/Simod/src/simod/cli.py", line 71, in optimize
    Optimizer(settings, event_log=event_log, output_dir=output_dir).run()
  File "/usr/src/Simod/src/simod/optimization/optimizer.py", line 213, in run
    result = self._optimize_structure()
  File "/usr/src/Simod/src/simod/optimization/optimizer.py", line 71, in _optimize_structure
    best_pipeline_settings, model_path, gateway_probabilities, parameters_path = optimizer.run()
  File "/usr/src/Simod/src/simod/process_structure/optimizer.py", line 148, in run
    best = fmin(fn=self._optimization_objective,
  File "/root/.cache/pypoetry/virtualenvs/simod-CiydOZFs-py3.9/lib/python3.9/site-packages/hyperopt/fmin.py", line 540, in fmin
    return trials.fmin(
  File "/root/.cache/pypoetry/virtualenvs/simod-CiydOZFs-py3.9/lib/python3.9/site-packages/hyperopt/base.py", line 671, in fmin
    return fmin(
  File "/root/.cache/pypoetry/virtualenvs/simod-CiydOZFs-py3.9/lib/python3.9/site-packages/hyperopt/fmin.py", line 593, in fmin
    return trials.argmin
  File "/root/.cache/pypoetry/virtualenvs/simod-CiydOZFs-py3.9/lib/python3.9/site-packages/hyperopt/base.py", line 620, in argmin
    best_trial = self.best_trial
  File "/root/.cache/pypoetry/virtualenvs/simod-CiydOZFs-py3.9/lib/python3.9/site-packages/hyperopt/base.py", line 611, in best_trial
    raise AllTrialsFailed
hyperopt.exceptions.AllTrialsFailed
iharsuvorau commented 1 year ago

Hi @LeonBein, thanks for reporting the issue. We're looking into it. However, it seems we are at the edge of what Split Miner can do here. Simod uses Split Miner first for basic model discovery and then optimises it by tuning different parameters. But if the model cannot be mined in any of the iterations, it fails.

Here's the Split Miner paper, https://link.springer.com/article/10.1007/s10115-018-1214-x#Sec17. The authors actually claim that they used BPIC 15 in the evaluation but applied filtering (https://ieeexplore.ieee.org/document/7579568/references#algorithm1) beforehand:

… we applied the filtering method in [11] to remove infrequent behavior prior to applying each of the discovery methods. Without this filtering step, all the method generated models with an F-score of close to zero due to the complexity of these logs …

I'll leave this issue open for now as a feature request in case we can extract the filtering implementation from the “Infrequent Behavior Filter” plugin for the ProM framework.

BTW, this is how the model looks mined with another tool:

Screenshot 2023-07-26 at 19 15 24

CC: @david-chapela, @marlondumas

marlondumas commented 1 year ago

I confirm the open-source Split Miner implementation might not be able to handle BPIC 2015 unfiltered. To handle this log, it would require some cleaning (preprocessing) and perhaps more than 2GB of RAM. Workaround:

LeonBein commented 1 year ago

Yes, I agree it probably doesn't make sense to mine this log without preprocessing. However, I am unsure whether the bug only occurs due to the size: Digging a little bit more, I found the following commit https://github.com/apromore/ProMforApromore/commit/050e23849631e451dcdc265c0fef4f24f082cfb1 which seems to fix the bug for a newer version of the plugin.

Maybe the plugin version used in Simod can be updated?

david-chapela commented 1 year ago

Hi @LeonBein,

Thanks for the hints! For now, @iharsuvorau managed to discover a BPMN process model with Apromore to bypass the use of SplitMiner, and run Simod to obtain the rest of the simulation parameters (we have an option to provide the process model in a BPMN file and skip the control-flow discovery phase).

In case you are interested in working with the BPIC 2015, we placed the discovered process model, as well as the Simod configuration in this folder. Also, a test running this can be found in this file.

We will take a look at the SplitMiner repository, and the Apromore repository you shared, to see if we can update the version of SplitMiner we have.

iharsuvorau commented 1 year ago

@LeonBein This issues should've been resolved by now. We've replaced recently the Split Miner build with a new one with less dependencies.