Open xander412 opened 2 years ago
Hey, did you figure it out? I got the same problem.
No man, seems both are different kinds of data and we should interpret both separately. This is not probable thing I had come up with.
On Fri, Jan 13, 2023, 5:20 PM Filip Papiernik @.***> wrote:
Hey, did you figure it out? I got the same problem.
— Reply to this email directly, view it on GitHub https://github.com/chuanconggao/PrefixSpan-py/issues/38#issuecomment-1381748282, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKLGD6DMPRGCZIMSWOMOUY3WSE6PHANCNFSM54IAD6PA . You are receiving this because you authored the thread.Message ID: @.***>
I gonna do the same thing and I'm about to transform each item into a string representation ex : [ ["1,2","1","1,3"], ["1,2,4", "3"], ["4,5", "1", "4,5,6"] ]
if the order can change just sort it each time before convert to string.
The readme says Outputs traditional single-item sequential patterns, in other words I don't think this implementation currently supports itemsets. As @KASDmusic suggests, you can use strings (or any Python hashable type, such as frozensets) instead of integers as your sequence items, but then you will not find subsequences with subsets of those itemsets e.g. in the sequence ["4,5", "1", "4,5,6"] you will not find the subsequence ["4", "1", "5, 6"] which should be a valid subsequence according to the PrefixSpan paper.
Actually sequential databases are like [ [[1, 2], [1], [1, 3]], [[1, 2, 4], [3]], [[4, 5], [1], [4,5,6]] ] How can we give this as input to this algorithm?