UdayLab / PAMI

PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
https://udaylab.github.io/PAMI/
GNU General Public License v3.0
242 stars 195 forks source link
frequent-itemsets frequent-pattern-mining frequent-subgraphs pattern-mining pattern-recognition periodic-patterns periodicity python sequence-mining spatial-data spatiotemporal-data spatiotemporal-data-analysis stream-mining

PyPI PyPI - Python Version GitHub license PyPI - Implementation Documentation Status PyPI - Wheel PyPI - Status GitHub issues GitHub forks GitHub stars Downloads Downloads Downloads pages-build-deployment Dependabot Updates CodeQL

Click here for more information


Table of Contents


Introduction

PAttern MIning (PAMI) is a Python library containing several algorithms to discover user interest-based patterns in a wide-spectrum of datasets across multiple computing platforms. Useful links to utilize the services of this library were provided below: NAME:SANGEETH

  1. Youtube tutorial https://www.youtube.com/playlist?list=PLKP768gjVJmDer6MajaLbwtfC9ULVuaCZ

  2. Tutorials (Notebooks) https://github.com/UdayLab/PAMI/tree/main/notebooks

  3. User manual https://udaylab.github.io/PAMI/manuals/index.html

  4. Coders manual https://udaylab.github.io/PAMI/codersManual/index.html

  5. Code documentation https://pami-1.readthedocs.io

  6. Datasets https://u-aizu.ac.jp/~udayrage/datasets.html

  7. Discussions on PAMI usage https://github.com/UdayLab/PAMI/discussions

  8. Report issues https://github.com/UdayLab/PAMI/issues


Flow Chart of Developing Algorithms in PAMI

PAMI's production process

*** # Inputs and Outputs of an Algorithm in PAMI ![Inputs and Outputs](./images/inputOutputPAMIalgo.png?raw=true) *** # Recent Updates - **Version 2024.07.02:** In this latest version, the following updates have been made: - Included one new algorithms, **PrefixSpan**, for Sequential Pattern. - Optimized the following pattern mining algorithms: **PFPGrowth, PFECLAT, GPFgrowth and PPF_DFS**. - Test cases are implemented for the following algorithms, **Contiguous Frequent patterns, Correlated Frequent Patterns, Coverage Frequent Patterns, Fuzzy Correlated Frequent Patterns, Fuzzy Frequent Patterns, Fuzzy Georeferenced Patterns, Georeferenced Frequent Patterns, Periodic Frequent Patterns, Partial Periodic Frequent Patterns, HighUtility Frequent Patterns, HighUtility Patterns, HighUtility Georeferenced Frequent Patterns, Frequent Patterns, Multiple Minimum Frequent Patterns, Periodic Frequent Patterns, Recurring Patterns, Sequential Patterns, Uncertain Frequent Patterns, Weighted Uncertain Frequent Patterns**. - The algorithms mentioned below are automatically tested, **Frequent Patterns, Correlated Frequent Patterns, Contiguous Frequent patterns, Coverage Frequent Patterns, Recurring Patterns, Sequential Patterns**. Total number of algorithms: 89 *** # Features - āœ… Tested to the best of our possibility - šŸ”‹ Highly optimized to our best effort, light-weight, and energy-efficient - šŸ‘€ Proper code documentation - šŸ¼ Ample examples of using various algorithms at [./notebooks](https://github.com/UdayLab/PAMI/tree/main/notebooks) folder - šŸ¤– Works with AI libraries such as TensorFlow, PyTorch, and sklearn. - āš”ļø Supports Cuda and PySpark - šŸ–„ļø Operating System Independence - šŸ”¬ Knowledge discovery in static data and streams - šŸŽ Snappy - šŸ» Ease of use *** # Maintenance __Installation__ 1. Installing basic pami package (recommended) pip install pami 2. Installing pami package in a GPU machine that supports CUDA pip install 'pami[gpu]' 3. Installing pami package in a distributed network environment supporting Spark pip install 'pami[spark]' 4. Installing pami package for developing purpose pip install 'pami[dev]' 5. Installing complete Library of pami pip install 'pami[all]' __Upgradation__ pip install --upgrade pami __Uninstallation__ pip uninstall pami __Information__ pip show pami *** # *Try your first PAMI program* ```shell $ python ``` ```python # first import pami from PAMI.frequentPattern.basic import FPGrowth as alg fileURL = "https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/Transactional_T10I4D100K.csv" minSup=300 obj = alg.FPGrowth(iFile=fileURL, minSup=minSup, sep='\t') #obj.startMine() #deprecated obj.mine() obj.save('frequentPatternsAtMinSupCount300.txt') frequentPatternsDF= obj.getPatternsAsDataFrame() print('Total No of patterns: ' + str(len(frequentPatternsDF))) #print the total number of patterns print('Runtime: ' + str(obj.getRuntime())) #measure the runtime print('Memory (RSS): ' + str(obj.getMemoryRSS())) print('Memory (USS): ' + str(obj.getMemoryUSS())) ``` ``` Output: Frequent patterns were generated successfully using frequentPatternGrowth algorithm Total No of patterns: 4540 Runtime: 8.749667644500732 Memory (RSS): 522911744 Memory (USS): 475353088 ``` *** # Evaluation: 1. we compared three different Python libraries such as PAMI, mlxtend and efficient-apriori for Apriori. 2. (Transactional_T10I4D100K.csv)is a transactional database downloaded from PAMI and used as an input file for all libraries. 3. Minimum support values and seperator are also same. * The performance of the **Apriori algorithm** is shown in the graphical results below: 1. Comparing the **Patterns Generated** by different Python libraries for the Apriori algorithm: Screenshot 2024-04-11 at 13 31 31 2. Evaluating the **Runtime** of the Apriori algorithm across different Python libraries: Screenshot 2024-04-11 at 13 31 20 3. Comparing the **Memory Consumption** of the Apriori algorithm across different Python libraries: Screenshot 2024-04-11 at 13 31 08 For more information, we have uploaded the evaluation file in two formats: - One **ipynb** file format, please check it here. [Evaluation File ipynb](https://github.com/UdayLab/PAMI/blob/main/notebooks/Evaluation-neverDelete.ipynb) - Two **pdf** file format, check here. [Evaluation File Pdf](https://github.com/UdayLab/PAMI/blob/main/notebooks/evaluation.pdf) *** # Reading Material For more examples, refer this YouTube link [YouTube](https://www.youtube.com/playlist?list=PLKP768gjVJmDer6MajaLbwtfC9ULVuaCZ) *** # License [![GitHub license](https://img.shields.io/github/license/UdayLab/PAMI)](https://github.com/UdayLab/PAMI/blob/main/LICENSE) *** # Documentation The official documentation is hosted on [PAMI](https://pami-1.readthedocs.io). *** # Background The idea and motivation to develop PAMI was from [Kitsuregawa Lab](https://www.tkl.iis.u-tokyo.ac.jp/new/resources?lang=en) at the University of Tokyo. Work on ``PAMI`` started at [University of Aizu](https://u-aizu.ac.jp/en/) in 2020 and has been under active development since then. *** # Getting Help For any queries, the best place to go to is Github Issues [GithubIssues](https://github.com/orgs/UdayLab/discussions/categories/q-a). *** # Discussion and Development In our GitHub repository, the primary platform for discussing development-related matters is the university lab. We encourage our team members and contributors to utilize this platform for a wide range of discussions, including bug reports, feature requests, design decisions, and implementation details. *** # Contribution to PAMI We invite and encourage all community members to contribute, report bugs, fix bugs, enhance documentation, propose improvements, and share their creative ideas. *** # Tutorials ### 0. Association Rule Mining | Basic | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Confidence Open In Colab | | Lift Open In Colab | | Leverage Open In Colab | ### 1. Pattern mining in binary transactional databases #### 1.1. Frequent pattern mining: [Sample](https://udaylab.github.io/PAMI/frequentPatternMining.html) | Basic | Closed | Maximal | Top-k | CUDA | pyspark | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Apriori Open In Colab | CHARM Open In Colab | maxFP-growth Open In Colab | FAE Open In Colab | cudaAprioriGCT | parallelApriori Open In Colab | | FP-growth Open In Colab | | | | cudaAprioriTID | parallelFPGrowth Open In Colab | | ECLAT Open In Colab | | | | cudaEclatGCT | parallelECLAT Open In Colab | | ECLAT-bitSet Open In Colab | | | | | | | ECLAT-diffset Open In Colab | | | | | #### 1.2. Relative frequent pattern mining: [Sample](https://udaylab.github.io/PAMI/relativeFrequentPatternMining.html) | Basic | |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | RSFP-growth Open In Colab | #### 1.3. Frequent pattern with multiple minimum support: [Sample](https://udaylab.github.io/PAMI/multipleMinSupFrequentPatternMining.html) | Basic | |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | CFPGrowth Open In Colab | | CFPGrowth++ Open In Colab | #### 1.4. Correlated pattern mining: [Sample](https://udaylab.github.io/PAMI/correlatePatternMining.html) | Basic | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | CoMine Open In Colab | | CoMine++ Open In Colab | #### 1.5. Fault-tolerant frequent pattern mining (under development) | Basic | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | FTApriori Open In Colab | | FTFPGrowth (under development) Open In Colab | #### 1.6. Coverage pattern mining (under development) | Basic | |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | CMine Open In Colab | | CMine++ Open In Colab | ### 2. Pattern mining in binary temporal databases #### 2.1. Periodic-frequent pattern mining: [Sample](https://udaylab.github.io/PAMI/periodicFrequentPatternMining.html) | Basic | Closed | Maximal | Top-K | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | PFP-growth Open In Colab | CPFP Open In Colab | maxPF-growth Open In Colab | kPFPMiner Open In Colab | | PFP-growth++ Open In Colab | | Topk-PFP Open In Colab | | PS-growth Open In Colab | | | | PFP-ECLAT Open In Colab | | | | PFPM-Compliments Open In Colab | | | #### 2.2. Local periodic pattern mining: [Sample](https://udaylab.github.io/PAMI/localPeriodicPatternMining.html) | Basic | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | LPPGrowth (under development) Open In Colab | | LPPMBreadth (under development) Open In Colab | | LPPMDepth (under development) Open In Colab | #### 2.3. Partial periodic-frequent pattern mining: [Sample](https://udaylab.github.io/PAMI/partialPeriodicFrequentPattern.html) | Basic | |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | GPF-growth Open In Colab | | PPF-DFS Open In Colab | | GPPF-DFS Open In Colab | #### 2.4. Partial periodic pattern mining: [Sample](https://udaylab.github.io/PAMI/partialPeriodicPatternMining.html) | Basic | Closed | Maximal | topK | CUDA | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 3P-growth Open In Colab | 3P-close Open In Colab | max3P-growth Open In Colab | topK-3P growth Open In Colab | cuGPPMiner (under development) Open In Colab | | | | | | 3P-ECLAT Open In Colab | | | | gPPMiner (under development) Open In Colab | | G3P-Growth Open In Colab | | | | | #### 2.5. Periodic correlated pattern mining: [Sample](https://udaylab.github.io/PAMI/periodicCorrelatedPatternMining.html) | Basic | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | EPCP-growth Open In Colab | #### 2.6. Stable periodic pattern mining: [Sample](https://udaylab.github.io/PAMI/stablePeriodicPatterns.html) | Basic | TopK | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------| | SPP-growth Open In Colab | TSPIN Open In Colab | | SPP-ECLAT Open In Colab | | #### 2.7. Recurring pattern mining: [Sample](https://udaylab.github.io/PAMI/RecurringPatterns.html) | Basic | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | RPgrowth Open In Colab | ### 3. Mining patterns from binary Geo-referenced (or spatiotemporal) databases #### 3.1. Geo-referenced frequent pattern mining: [Sample](https://udaylab.github.io/PAMI/frequentSpatialPatternMining.html) | Basic | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | spatialECLAT Open In Colab | | FSP-growth Open In Colab | #### 3.2. Geo-referenced periodic frequent pattern mining: [Sample](https://udaylab.github.io/PAMI/periodicFrequentSpatial.html) | Basic | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | GPFPMiner Open In Colab | | PFS-ECLAT Open In Colab | | ST-ECLAT Open In Colab | #### 3.3. Geo-referenced partial periodic pattern mining:[Sample](https://udaylab.github.io/PAMI/partialPeriodicSpatialPatternMining.html) | Basic | |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | STECLAT Open In Colab | ### 4. Mining patterns from Utility (or non-binary) databases #### 4.1. High utility pattern mining: [Sample](https://udaylab.github.io/PAMI/highUtilityPatternMining.html) | Basic | |----------| | EFIM Open In Colab | | HMiner Open In Colab | | UPGrowth Open In Colab | #### 4.2. High utility frequent pattern mining: [Sample](https://udaylab.github.io/PAMI/highUtiltiyFrequentPatternMining.html) | Basic | |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | HUFIM Open In Colab | #### 4.3. High utility geo-referenced frequent pattern mining: [Sample](https://udaylab.github.io/PAMI/highUtilitySpatialPatternMining.html) | Basic | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | SHUFIM Open In Colab | #### 4.4. High utility spatial pattern mining: [Sample](https://udaylab.github.io/PAMI/highUtilitySpatialPatternMining.html) | Basic | topk | |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | HDSHIM Open In Colab | TKSHUIM Open In Colab | | SHUIM Open In Colab | #### 4.5. Relative High utility pattern mining: [Sample](https://github.com/UdayLab/PAMI/blob/main/sampleManuals/mainManuals/relativeUtility.html) | Basic | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | RHUIM Open In Colab | #### 4.6. Weighted frequent pattern mining: [Sample](https://udaylab.github.io/PAMI/weightedFrequentPattern.html) | Basic | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | WFIM Open In Colab | #### 4.7. Weighted frequent regular pattern mining: [Sample](https://udaylab.github.io/PAMI/weightedFrequentRegularPatterns.html) | Basic | |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | WFRIMiner Open In Colab | #### 4.8. Weighted frequent neighbourhood pattern mining: [Sample](https://github.com/UdayLab/PAMI/blob/main/docs/weightedSpatialFrequentPattern.html) | Basic | |-------------| | SSWFPGrowth | ### 5. Mining patterns from fuzzy transactional/temporal/geo-referenced databases #### 5.1. Fuzzy Frequent pattern mining: [Sample](https://github.com/UdayLab/PAMI/fuzzyFrequentPatternMining.html) | Basic | |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | FFI-Miner Open In Colab | #### 5.2. Fuzzy correlated pattern mining: [Sample](https://udaylab.github.io/PAMI/fuzzyCorrelatedPatternMining.html) | Basic | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | FCP-growth Open In Colab | #### 5.3. Fuzzy geo-referenced frequent pattern mining: [Sample](https://github.com/UdayLab/PAMI/fuzzyFrequentSpatialPatternMining.html) | Basic | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | FFSP-Miner Open In Colab | #### 5.4. Fuzzy periodic frequent pattern mining: [Sample](https://github.com/UdayLab/PAMI/fuzzyPeriodicFrequentPatternMining.html) | Basic | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | FPFP-Miner Open In Colab | #### 5.5. Fuzzy geo-referenced periodic frequent pattern mining: [Sample](https://github.com/UdayLab/PAMI/fuzzySpatialPeriodicFrequentPatternMining.html) | Basic | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | FGPFP-Miner (under development) Open In Colab | ### 6. Mining patterns from uncertain transactional/temporal/geo-referenced databases #### 6.1. Uncertain frequent pattern mining: [Sample](https://udaylab.github.io/PAMI/uncertainFrequentPatternMining.html) | Basic | top-k | |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------| | PUF Open In Colab | TUFP | | TubeP Open In Colab | | | TubeS Open In Colab | | | UVEclat | | #### 6.2. Uncertain periodic frequent pattern mining: [Sample](https://udaylab.github.io/PAMI/uncertainPeriodicFrequentPatternMining.html) | Basic | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | UPFP-growth Open In Colab | | UPFP-growth++ Open In Colab | #### 6.3. Uncertain Weighted frequent pattern mining: [Sample](https://udaylab.github.io/PAMI/weightedUncertainFrequentPatterns.html) | Basic | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | WUFIM Open In Colab | ### 7. Mining patterns from sequence databases #### 7.1. Sequence frequent pattern mining: [Sample](https://github.com/UdayLab/PAMI/blob/main/docs/weightedSpatialFrequentPattern.html) | Basic | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | SPADE Open In Colab | | PrefixSpan Open In Colab | #### 7.2. Geo-referenced Frequent Sequence Pattern mining | Basic | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | GFSP-Miner (under development) Open In Colab | ### 8. Mining patterns from multiple timeseries databases #### 8.1. Partial periodic pattern mining (under development) | Basic | |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | PP-Growth (under development) Open In Colab | ## 9. Mining interesting patterns from Streams 1. Frequent pattern mining | Basic | |---------------| | to be written | 2. High utility pattern mining | Basic | |-------| | HUPMS | ## 10. Mining patterns from contiguous character sequences (E.g., DNA, Genome, and Game sequences) #### 10.1. Contiguous Frequent Patterns | Basic | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | PositionMining Open In Colab | ## 11. Mining patterns from Graphs #### 11.1. Frequent sub-graph mining | Basic | topk | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Gspan Open In Colab | TKG Open In Colab | #### 11.2. Graph transactional coverage pattern mining | Basic | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | GTCP Open In Colab | ## 12. Additional Features #### 12.1. Creation of synthetic databases | Database type | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Transactional database Open In Colab | | | Temporal database Open In Colab | | Utility database (coming soon) | | spatio-transactional database (coming soon) | | spatio-temporal database (coming soon) | | fuzzy transactional database (coming soon) | | fuzzy temporal database (coming soon) | | Sequence database generator (coming soon) | #### 12.2. Converting a dataframe into a specific database type | Approaches | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Dense dataframe to databases Open In Colab | | Sparse dataframe to databases (coming soon) | #### 12.3. Gathering the statistical details of a database | Approaches | |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Transactional database Open In Colab | | Temporal database Open In Colab | | Utility database (coming soon) | #### 12.4. Convertors | Approaches | |----------------------------| | Subgraphs2FlatTransactions | | CSV2Parquet | | CSV2BitInteger | | CSV2Integer | #### 12.4. Generating Latex code for the experimental results | Approaches | |--------------------------| | Latex code (coming soon) | *** # Real World Case Studies 1. Air pollution analytics Open In Colab [Go to Top](#table-of-contents)