Closed nisalup closed 6 years ago
Hi, the rows do not have to have same length.
I tried your example (removed some items to make it faster) and it worked:
In [6]: db = [
...: [4, 3, 3, 3, 3],
...: [3, 3, 3, 4, 3, 3],
...: [3, 3, 3, 3, 3, 3, 4]
...: ]
...:
...: from prefixspan import PrefixSpan
...: ps = PrefixSpan(db)
...: ps.frequent(2)
...:
Out[6]:
[(3, [4]),
(2, [4, 3]),
(2, [4, 3, 3]),
(3, [3]),
(3, [3, 3]),
(3, [3, 3, 3]),
(3, [3, 3, 3, 3]),
(2, [3, 3, 3, 3, 3]),
(2, [3, 3, 3, 4]),
(2, [3, 3, 4]),
(2, [3, 4])]
As I cannot reproduce this issue, please provide more details. I will close this issue right now.
Ok I checked with smaller datasets with lesser width, and it works fine. Actually it works well even for large datasets (50000+ rows) where the freeman code length is less than 10 digits. However, my freeman code for a single number has around 70 digits. I tried the codes on several high end PCs but causes them to go out of memory.
I would like to try the BIDE algorithm on this dataset as it is a closed FPM method. But I am confused how to use it using the API, and in the closed.py file, I am at odds figuring out the parameters db, patt, matches
and the methods isclosed
and canclosedprune
. As a beginner, it would be helpful if you could add an example to the readme and explain the parameters. Thank you!
Hi you are trying to understand internal parameters. Please refer to readme at the root of project. There is one example of how to use BIDE. You only need one extra parameter called closed=True, comparing to PrefixSpan.
On Tue, Nov 20, 2018 at 3:34 PM Nisal Upendra notifications@github.com wrote:
Ok I checked with smaller datasets with lesser width, and it works fine. Actually it works well even for large datasets (50000+ rows) where the freeman code length is less than 10 digits. However, my freeman code for a single number has around 70 digits. I tried the codes on several high end PCs but causes them to go out of memory. I would like to try the BIDE algorithm on this dataset as it is a closed FPM method. But I am confused how to use it using the API, and in the closed.py file, I am at odds figuring out the parameters db, patt, matches and the methods isclosed and canclosedprune. As a beginner, it would be helpful if you could add an example to the readme and explain the parameters. Thank you!
— You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub https://github.com/chuanconggao/PrefixSpan-py/issues/10#issuecomment-440467424, or mute the thread https://github.com/notifications/unsubscribe-auth/AGpCEeG5FM6lEBFE1o5AWlKzWUf0a9h_ks5uxJGOgaJpZM4YgjoY .
I am trying the prefixspan algorithm
(def frequent_rec)
on a dataset of freeman codes (it is an list of lists of variable length, which have integer values. [ [4, 3, 3, 3, 3, 3, 3, 3, 4, 4, 3, 5, 5, 5, 5, 5, 6, 5, 5, 4, 5, 5, 5, 5, 6, 5, 6, 7, 6, 7, 7, 6, 7, 0, 0, 2, 3, 3, 3, 2, 3, 2, 2, 1, 1, 0, 7, 7, 7, 7, 7, 7, 7, 1, 1, 3, 2, 3, 3, 2, 3, 2, 2, 0, 0, 7, 0, 7, 7, 7, 7, 7, 7, 0, 1, 3, 2], [3, 3, 3, 4, 3, 3, 4, 3, 3, 3, 3, 4, 5, 4, 5, 5, 4, 5, 5, 5, 5, 6, 7, 7, 6, 5, 5, 5, 5, 6, 7, 7, 7, 0, 2, 1, 1, 0, 1, 0, 7, 7, 0, 0, 2, 2, 2, 1, 0, 0, 7, 0, 1, 1, 2], .............. .............. [3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 5, 5, 5, 6, 5, 6, 6, 5, 5, 5, 6, 5, 5, 5, 5, 7, 7, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 0, 2, 1, 1, 0, 6, 6, 7, 6, 5, 5, 4, 7, 0, 0, 1, 1, 1, 2, 3, 3, 2]] The dataset has about 60000 rows. However, when running the algorithm, I get an errorin this line:
l = occurs[seq[j]]
I also tried with your first implementation and yet had the same error. Do the items have to be of the same length? It would be nice if you had an idea. Thank you.