How are intrepreting this as sequential data bases

chuanconggao / PrefixSpan-py

The shortest yet efficient Python implementation of the sequential pattern mining algorithm PrefixSpan, closed sequential pattern mining algorithm BIDE, and generator sequential pattern mining algorithm FEAT.

https://git.io/prefixspan

MIT License

414 stars 92 forks source link

How are intrepreting this as sequential data bases #38

Open xander412 opened 2 years ago

xander412 commented 2 years ago

Actually sequential databases are like [ [[1, 2], [1], [1, 3]], [[1, 2, 4], [3]], [[4, 5], [1], [4,5,6]] ] How can we give this as input to this algorithm?

LeCarteloo commented 1 year ago

Hey, did you figure it out? I got the same problem.

xander412 commented 1 year ago

No man, seems both are different kinds of data and we should interpret both separately. This is not probable thing I had come up with.

On Fri, Jan 13, 2023, 5:20 PM Filip Papiernik @.***> wrote:

Hey, did you figure it out? I got the same problem.

— Reply to this email directly, view it on GitHub https://github.com/chuanconggao/PrefixSpan-py/issues/38#issuecomment-1381748282, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKLGD6DMPRGCZIMSWOMOUY3WSE6PHANCNFSM54IAD6PA . You are receiving this because you authored the thread.Message ID: @.***>

KASDmusic commented 1 year ago

I gonna do the same thing and I'm about to transform each item into a string representation ex : [ ["1,2","1","1,3"], ["1,2,4", "3"], ["4,5", "1", "4,5,6"] ]

if the order can change just sort it each time before convert to string.

kittentronic commented 7 months ago

The readme says Outputs traditional single-item sequential patterns, in other words I don't think this implementation currently supports itemsets. As @KASDmusic suggests, you can use strings (or any Python hashable type, such as frozensets) instead of integers as your sequence items, but then you will not find subsequences with subsets of those itemsets e.g. in the sequence ["4,5", "1", "4,5,6"] you will not find the subsequence ["4", "1", "5, 6"] which should be a valid subsequence according to the PrefixSpan paper.