chuanconggao / PrefixSpan-py

The shortest yet efficient Python implementation of the sequential pattern mining algorithm PrefixSpan, closed sequential pattern mining algorithm BIDE, and generator sequential pattern mining algorithm FEAT.
https://git.io/prefixspan
MIT License
414 stars 92 forks source link

How to handle the multiple items per transactions? #12

Closed shuaianuoe closed 5 years ago

shuaianuoe commented 5 years ago

hi, your work is great. I have a question, if itemsets contain one multiple items, such as [1,[2,3],4,2,1], that is to say, 2 and 3 appear simultaneously, How should I adjust the input data so that the algorithm can handle this problem?

chuanconggao commented 5 years ago

The algorithms in this project currently do NOT support this feature right now.

Currently, the only possible solution of using existing code is creating a new item like 23 to represent [2, 3], so you can have a regular transaction [1, 23, 4, 2, 1]. However, this means you cannot find any pattern that only covers a subset of [2, 3].

Thus, I highly advise finding other algorithms supporting this use case.