chuanconggao / PrefixSpan-py

The shortest yet efficient Python implementation of the sequential pattern mining algorithm PrefixSpan, closed sequential pattern mining algorithm BIDE, and generator sequential pattern mining algorithm FEAT.
https://git.io/prefixspan
MIT License
414 stars 92 forks source link

Multiple items per transactions and retrieving patterns #7

Closed demodw closed 6 years ago

demodw commented 6 years ago

Hi,

is it possible to specify multiple items per transaction?

For instance,

[ [0, [1,2], 3], [0, 1, 3], [4, 5, 0, [1,2], 3] ]

Would also have the pattern [0, [1,2], 3] observed two times. Second, is it possible to specify that the framework should return all patterns that meet the minimum support criteria, without knowing how many there are in advance?

Thanks for a great tool! It seems to be super fast, even for millions of sequences.

chuanconggao commented 6 years ago

Currently, having multiple items per event is not supported. It is because there are three algorithms within current framework, and I prefer adding this kind of support to all of them. This is on the todo list.

For your second question, the algorithms return all the frequent (closed/generator) patterns w.r.t. the minimum support threshold.