JaredSchwartz / RuleMiner.jl

Association Rule Mining in Julia
MIT License
6 stars 0 forks source link

Performance tuning #7

Closed JaredSchwartz closed 3 weeks ago

JaredSchwartz commented 3 weeks ago

Reworked both the A Priori Algorithm and ECLAT algorithms to operate faster. Depending on the dataset and parameters, I'm seeing:

A Priori Changes:

Added transaction database subsetting

In the old implementation, each child node search was also calculating support for all columns that don't meet the minimum support threshold. With this change, the algorithm now only calculates support for the base nodes in each iteration.

Added hashing and deduplication of rules during generation

A Priori tends to create redundant rules where the LHS will have the same items in a different order. This can be solved with sets or bitsets, but converting to and from these to vectors to slice the transactions matrix is very slow.

The solution used here keeps the vector architecture for fast matrix slicing, but removes redundant rules at each level of calculation. This prevents further rule generation based on the redundant rules, and it provides greater resource savings the greater maximum length of the rules being generated.

ECLAT Changes

Enabled multithreading

Enabling multithreading in ECLAT is tricky because of its recursive nature. This approach multi-threads at the first equivalence class, keeping the threads entirely independent.

This speeds up ECLAT significantly, but the amount of computation each thread does is very uneven due to the search pattern in ECLAT. Adding additional cores will see diminishing returns on speed gains.

Testing Changes

Sorted itemsets generated by ECLAT

Sorting the itemsets is now necessary with the introduction of multithreading.

codecov-commenter commented 3 weeks ago

Welcome to Codecov :tada:

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

:information_source: You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered :open_umbrella: