JaredSchwartz / RuleMiner.jl

Association Rule Mining in Julia
MIT License
6 stars 0 forks source link

RuleMiner.jl - Association Rule Mining in Julia

Build Status codecov

About

RuleMiner.jl is a Julia package for association rule and frequent itemset mining inspired by the arules R package and SPMF Java library.

Key features of RuleMiner.jl include:

Algorithms

The package currently has support for these algorithms:

Association Rule Mining

Frequent Itemset Mining

Closed Itemset Mining

Installation

julia> ]

pkg> add RuleMiner

Usage

These examples use the retail dataset from the Frequent Itemset Mining Implementenations (FIMI) repository hosted by the University of Antwerp.

using RuleMiner

Load data to create a Transactions object or alternatively convert an existing 1-hot encoded DataFrame.

data = load_transactions("retail.txt",' ')

data = transactions(df)

Generate association rules using A Priori with 10% minimum support and a max rule length of 3.

arules = apriori(data, 0.1, 3)

Result:

13×8 DataFrame
 Row │ LHS       RHS     Support   Confidence  Coverage  Lift       N      Length 
     │ Array…    String  Float64   Float64     Float64   Float64    Int64  Int64  
─────┼────────────────────────────────────────────────────────────────────────────
   1 │ String[]  33      0.172036    0.172036  1.0          1.0     15167       1
   2 │ String[]  39      0.176902    0.176902  1.0          1.0     15596       1
   3 │ String[]  40      0.574794    0.574794  1.0          1.0     50675       1
   4 │ String[]  42      0.169517    0.169517  1.0          1.0     14945       1
   5 │ String[]  49      0.477927    0.477927  1.0          1.0     42135       1
   6 │ ["40"]    42      0.129466    0.225239  0.574794  2482.19    11414       2
   7 │ ["49"]    42      0.102289    0.214026  0.477927  2358.62     9018       2
   8 │ ["39"]    40      0.117341    0.663311  0.176902   106.519   10345       2
   9 │ ["40"]    49      0.330551    0.575076  0.574794  2668.42    29142       2
  10 │ ["49"]    40      0.330551    0.691634  0.477927   111.067   29142       2
  11 │ ["42"]    49      0.102289    0.603413  0.169517  2799.9      9018       2
  12 │ ["40"]    39      0.117341    0.204144  0.574794    67.6607  10345       2
  13 │ ["42"]    40      0.129466    0.763734  0.169517   122.645   11414       2

Generate frequent itemsets with a minimum support of 5,000 transactions using ECLAT

itemsets = eclat(data, 5000)

Result:

15×4 DataFrame
 Row │ Itemset             Support    N      Length 
     │ Array…              Float64    Int64  Int64  
─────┼──────────────────────────────────────────────
   1 │ ["42"]              0.169517   14945       1
   2 │ ["33"]              0.172036   15167       1
   3 │ ["39"]              0.176902   15596       1
   4 │ ["49"]              0.477927   42135       1
   5 │ ["40"]              0.574794   50675       1
   6 │ ["39", "49"]        0.0901068   7944       2
   7 │ ["49", "40"]        0.330551   29142       2
   8 │ ["39", "49", "40"]  0.0692135   6102       3
   9 │ ["39", "40"]        0.117341   10345       2
  10 │ ["42", "49"]        0.102289    9018       2
  11 │ ["42", "49", "40"]  0.0835507   7366       3
  12 │ ["42", "40"]        0.129466   11414       2
  13 │ ["33", "49"]        0.0911277   8034       2
  14 │ ["33", "49", "40"]  0.0612736   5402       3
  15 │ ["33", "40"]        0.095903    8455       2

Multithreading

RuleMiner.jl makes use of Julia's native multithreading support for significant performance gains. Enabling multithreading is done by using the -t flag when launching Julia and either specifying the number of threads or passing in the auto argument to launch julia with all available threads.

$ julia -t auto

Once Julia is launched, you can can view the enabled threads with nthreads() from the Base.Threads module.

julia> using Base.Threads

julia> nthreads()

See this post for more info on multithreading in Julia.

[!TIP] Multithreading can be configured for the VScode integrated terminal by setting the julia.NumThreads parameter in VScode settings.

Future Work

RuleMiner 0.4.0 is planned to support maximal itemset mining.

References

[^1]: Agrawal, Rakesh, and Ramakrishnan Srikant. “Fast Algorithms for Mining Association Rules in Large Databases.” In Proceedings of the 20th International Conference on Very Large Data Bases, 487–99. VLDB ’94. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1994.

[^2]: Zaki, Mohammed. “Scalable Algorithms for Association Mining.” Knowledge and Data Engineering, IEEE Transactions On 12 (June 1, 2000): 372–90. https://doi.org/10.1109/69.846291.

[^3]: Han, Jiawei, Jian Pei, and Yiwen Yin. “Mining Frequent Patterns without Candidate Generation.” SIGMOD Rec. 29, no. 2 (May 16, 2000): 1–12. https://doi.org/10.1145/335191.335372.

[^4]: Grahne, Gösta, and Jianfei Zhu. “Fast Algorithms for Frequent Itemset Mining Using FP-Trees.” IEEE Transactions on Knowledge and Data Engineering 17, no. 10 (October 2005): 1347–62. https://doi.org/10.1109/TKDE.2005.166.

[^5]: Zaki, Mohammed, and Ching-Jui Hsiao. “CHARM: An Efficient Algorithm for Closed Itemset Mining.” In Proceedings of the 2002 SIAM International Conference on Data Mining (SDM), 457–73. Proceedings. Society for Industrial and Applied Mathematics, 2002. https://doi.org/10.1137/1.9781611972726.27.

[^6]: Uno, Takeaki, Tatsuya Asai, Yuzo Uchida, and Hiroki Arimura. “An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases.” In Discovery Science, edited by Einoshin Suzuki and Setsuo Arikawa, 16–31. Berlin, Heidelberg: Springer, 2004. https://doi.org/10.1007/978-3-540-30214-8_2.

[^7]: Pan, Feng, Gao Cong, Anthony K. H. Tung, Jiong Yang, and Mohammed J. Zaki. “Carpenter: Finding Closed Patterns in Long Biological Datasets.” In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 637–42. KDD ’03. New York, NY, USA: Association for Computing Machinery, 2003. https://doi.org/10.1145/956750.956832.

[^8]: Pasquier, Nicolas, Yves Bastide, Rafik Taouil, and Lotfi Lakhal. “Efficient Mining of Association Rules Using Closed Itemset Lattices.” Information Systems 24, no. 1 (March 1, 1999): 25–46. https://doi.org/10.1016/S0306-4379(99)00003-4.