chuanconggao / PrefixSpan-py

The shortest yet efficient Python implementation of the sequential pattern mining algorithm PrefixSpan, closed sequential pattern mining algorithm BIDE, and generator sequential pattern mining algorithm FEAT.
https://git.io/prefixspan
MIT License
414 stars 92 forks source link

TypeError: unhashable type: 'list' #10

Closed nisalup closed 6 years ago

nisalup commented 6 years ago

I am trying the prefixspan algorithm (def frequent_rec) on a dataset of freeman codes (it is an list of lists of variable length, which have integer values. [ [4, 3, 3, 3, 3, 3, 3, 3, 4, 4, 3, 5, 5, 5, 5, 5, 6, 5, 5, 4, 5, 5, 5, 5, 6, 5, 6, 7, 6, 7, 7, 6, 7, 0, 0, 2, 3, 3, 3, 2, 3, 2, 2, 1, 1, 0, 7, 7, 7, 7, 7, 7, 7, 1, 1, 3, 2, 3, 3, 2, 3, 2, 2, 0, 0, 7, 0, 7, 7, 7, 7, 7, 7, 0, 1, 3, 2], [3, 3, 3, 4, 3, 3, 4, 3, 3, 3, 3, 4, 5, 4, 5, 5, 4, 5, 5, 5, 5, 6, 7, 7, 6, 5, 5, 5, 5, 6, 7, 7, 7, 0, 2, 1, 1, 0, 1, 0, 7, 7, 0, 0, 2, 2, 2, 1, 0, 0, 7, 0, 1, 1, 2], .............. .............. [3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 5, 5, 5, 6, 5, 6, 6, 5, 5, 5, 6, 5, 5, 5, 5, 7, 7, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 0, 2, 1, 1, 0, 6, 6, 7, 6, 5, 5, 4, 7, 0, 0, 1, 1, 1, 2, 3, 3, 2]] The dataset has about 60000 rows. However, when running the algorithm, I get an error

TypeError: unhashable type: 'list'

in this line: l = occurs[seq[j]] I also tried with your first implementation and yet had the same error. Do the items have to be of the same length? It would be nice if you had an idea. Thank you.

chuanconggao commented 6 years ago

Hi, the rows do not have to have same length.

I tried your example (removed some items to make it faster) and it worked:

In [6]: db = [ 
   ...: [4, 3, 3, 3, 3], 
   ...: [3, 3, 3, 4, 3, 3], 
   ...: [3, 3, 3, 3, 3, 3, 4] 
   ...: ] 
   ...:  
   ...: from prefixspan import PrefixSpan 
   ...: ps = PrefixSpan(db) 
   ...: ps.frequent(2)   
   ...:                                                                                                                              
Out[6]: 
[(3, [4]),
 (2, [4, 3]),
 (2, [4, 3, 3]),
 (3, [3]),
 (3, [3, 3]),
 (3, [3, 3, 3]),
 (3, [3, 3, 3, 3]),
 (2, [3, 3, 3, 3, 3]),
 (2, [3, 3, 3, 4]),
 (2, [3, 3, 4]),
 (2, [3, 4])]

As I cannot reproduce this issue, please provide more details. I will close this issue right now.

nisalup commented 6 years ago

Ok I checked with smaller datasets with lesser width, and it works fine. Actually it works well even for large datasets (50000+ rows) where the freeman code length is less than 10 digits. However, my freeman code for a single number has around 70 digits. I tried the codes on several high end PCs but causes them to go out of memory. I would like to try the BIDE algorithm on this dataset as it is a closed FPM method. But I am confused how to use it using the API, and in the closed.py file, I am at odds figuring out the parameters db, patt, matches and the methods isclosedand canclosedprune. As a beginner, it would be helpful if you could add an example to the readme and explain the parameters. Thank you!

chuanconggao commented 6 years ago

Hi you are trying to understand internal parameters. Please refer to readme at the root of project. There is one example of how to use BIDE. You only need one extra parameter called closed=True, comparing to PrefixSpan.

On Tue, Nov 20, 2018 at 3:34 PM Nisal Upendra notifications@github.com wrote:

Ok I checked with smaller datasets with lesser width, and it works fine. Actually it works well even for large datasets (50000+ rows) where the freeman code length is less than 10 digits. However, my freeman code for a single number has around 70 digits. I tried the codes on several high end PCs but causes them to go out of memory. I would like to try the BIDE algorithm on this dataset as it is a closed FPM method. But I am confused how to use it using the API, and in the closed.py file, I am at odds figuring out the parameters db, patt, matches and the methods isclosed and canclosedprune. As a beginner, it would be helpful if you could add an example to the readme and explain the parameters. Thank you!

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/chuanconggao/PrefixSpan-py/issues/10#issuecomment-440467424, or mute the thread https://github.com/notifications/unsubscribe-auth/AGpCEeG5FM6lEBFE1o5AWlKzWUf0a9h_ks5uxJGOgaJpZM4YgjoY .