Open StefanBloemheuvel opened 5 years ago
I was also unable to find this feature so I updated the class PrefixSpan(object):
like this:
class PrefixSpan(object):
def __init__(self, db, minLen = 1, maxLen = 1000):
# type: (List[List[int]]) -> None
self._db = db
self.minlen = minLen
self.maxlen = maxLen
self._results = [] # type: Any
I did a couple of tests for frequent patterns and frequent closed patterns and the change did not add new any issues. After the update you should be able to do something like:
prefix = PrefixSpan(sourceData, minLen=3, maxLen=10)
for pattern in prefix.frequent(minSupport):
print(pattern)
Keep in mind that the evaluation for closed patterns is not affected by the length. For example for the sequence database:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3, 5, 7, 9],
[3, 6, 9, 0, 3, 6, 9, 3, 6, 9],
[0, 7, 0, 7, 0, 7, 0, 7]
if we run the command:
prefix = PrefixSpan(sourceData, minLen=2, maxLen=3)
for pattern in prefix.frequent(3):
print(pattern)
you get the pattern: [3, [7, 7]]
Patterns like: [3, [3, 9, 3]]
and [3, [9, 3, 9]]
are not returned because there exist one super pattern that contains both of them [3, [3, 9, 3, 9]]
, but this pattern has the length 4 and we set the maxLen=3 so it is not returned.
@chuanconggao If I make the update would you consider merging the changes related to max and min len parameters of the Prefix span init?
@VBota1 Pull request is highly welcome. Thanks.
I was also unable to find this feature so I updated the
class PrefixSpan(object):
like this:class PrefixSpan(object): def __init__(self, db, minLen = 1, maxLen = 1000): # type: (List[List[int]]) -> None self._db = db self.minlen = minLen self.maxlen = maxLen self._results = [] # type: Any
I did a couple of tests for frequent patterns and frequent closed patterns and the change did not add new any issues. After the update you should be able to do something like:
prefix = PrefixSpan(sourceData, minLen=3, maxLen=10) for pattern in prefix.frequent(minSupport): print(pattern)
Keep in mind that the evaluation for closed patterns is not affected by the length. For example for the sequence database:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3, 5, 7, 9], [3, 6, 9, 0, 3, 6, 9, 3, 6, 9], [0, 7, 0, 7, 0, 7, 0, 7]
if we run the command:
prefix = PrefixSpan(sourceData, minLen=2, maxLen=3) for pattern in prefix.frequent(3): print(pattern)
you get the pattern:
[3, [7, 7]]
Patterns like:
[3, [3, 9, 3]]
and[3, [9, 3, 9]]
are not returned because there exist one super pattern that contains both of them[3, [3, 9, 3, 9]]
, but this pattern has the length 4 and we set the maxLen=3 so it is not returned.@chuanconggao If I make the update would you consider merging the changes related to max and min len parameters of the Prefix span init?
@VBota1 - I tried to follow ur given steps. I created a new class with named as 'PrefixSpan_My'. But getting some error as mentioned below; will you be able to provide any suggestion to fix this issue?
ps = PrefixSpan_My(data, minLen = 3) print(ps.frequent(2))
Error: 'NoneType' object is not callable
For anyone else coming across this issue, one way to get around this (without requiring changes to the library) is to do the following:
ps = PrefixSpan(transactions)
ps.minlen = 2
ps.maxlen = 5
result = ps.frequent(2)
hi,
is there a way to edit the minlen? i am only interested in patterns between 2 and 5 length. (i am working in python btw)
Thanks in advance, the package works great!