chuanconggao / PrefixSpan-py

The shortest yet efficient Python implementation of the sequential pattern mining algorithm PrefixSpan, closed sequential pattern mining algorithm BIDE, and generator sequential pattern mining algorithm FEAT.
https://git.io/prefixspan
MIT License
414 stars 92 forks source link

a way to edit the minlen #18

Open StefanBloemheuvel opened 5 years ago

StefanBloemheuvel commented 5 years ago

hi,

is there a way to edit the minlen? i am only interested in patterns between 2 and 5 length. (i am working in python btw)

Thanks in advance, the package works great!

VBota1 commented 4 years ago

I was also unable to find this feature so I updated the class PrefixSpan(object): like this:

class PrefixSpan(object):
    def __init__(self, db, minLen = 1, maxLen = 1000):
        # type: (List[List[int]]) -> None
        self._db = db

        self.minlen = minLen
        self.maxlen = maxLen

        self._results = [] # type: Any

I did a couple of tests for frequent patterns and frequent closed patterns and the change did not add new any issues. After the update you should be able to do something like:

        prefix = PrefixSpan(sourceData, minLen=3, maxLen=10)

        for pattern in prefix.frequent(minSupport):
            print(pattern)

Keep in mind that the evaluation for closed patterns is not affected by the length. For example for the sequence database:

    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    [1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3, 5, 7, 9],
    [3, 6, 9, 0, 3, 6, 9, 3, 6, 9],
    [0, 7, 0, 7, 0, 7, 0, 7]

if we run the command:

        prefix = PrefixSpan(sourceData, minLen=2, maxLen=3)

        for pattern in prefix.frequent(3):
            print(pattern)

you get the pattern: [3, [7, 7]]

Patterns like: [3, [3, 9, 3]] and [3, [9, 3, 9]] are not returned because there exist one super pattern that contains both of them [3, [3, 9, 3, 9]], but this pattern has the length 4 and we set the maxLen=3 so it is not returned.

@chuanconggao If I make the update would you consider merging the changes related to max and min len parameters of the Prefix span init?

chuanconggao commented 4 years ago

@VBota1 Pull request is highly welcome. Thanks.

abhi-rawat1 commented 4 years ago

I was also unable to find this feature so I updated the class PrefixSpan(object): like this:

class PrefixSpan(object):
    def __init__(self, db, minLen = 1, maxLen = 1000):
        # type: (List[List[int]]) -> None
        self._db = db

        self.minlen = minLen
        self.maxlen = maxLen

        self._results = [] # type: Any

I did a couple of tests for frequent patterns and frequent closed patterns and the change did not add new any issues. After the update you should be able to do something like:

        prefix = PrefixSpan(sourceData, minLen=3, maxLen=10)

        for pattern in prefix.frequent(minSupport):
            print(pattern)

Keep in mind that the evaluation for closed patterns is not affected by the length. For example for the sequence database:

    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    [1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3, 5, 7, 9],
    [3, 6, 9, 0, 3, 6, 9, 3, 6, 9],
    [0, 7, 0, 7, 0, 7, 0, 7]

if we run the command:

        prefix = PrefixSpan(sourceData, minLen=2, maxLen=3)

        for pattern in prefix.frequent(3):
            print(pattern)

you get the pattern: [3, [7, 7]]

Patterns like: [3, [3, 9, 3]] and [3, [9, 3, 9]] are not returned because there exist one super pattern that contains both of them [3, [3, 9, 3, 9]], but this pattern has the length 4 and we set the maxLen=3 so it is not returned.

@chuanconggao If I make the update would you consider merging the changes related to max and min len parameters of the Prefix span init?

@VBota1 - I tried to follow ur given steps. I created a new class with named as 'PrefixSpan_My'. But getting some error as mentioned below; will you be able to provide any suggestion to fix this issue?

ps = PrefixSpan_My(data, minLen = 3) print(ps.frequent(2))

Error: 'NoneType' object is not callable

image

lionralfs commented 3 years ago

For anyone else coming across this issue, one way to get around this (without requiring changes to the library) is to do the following:

ps = PrefixSpan(transactions)
ps.minlen = 2
ps.maxlen = 5

result = ps.frequent(2)