SongDark / FPgrowth

FP-growth codes in "Machine Learning in Action"
53 stars 26 forks source link

Running with Python 3.x #1

Open jolasman opened 5 years ago

jolasman commented 5 years ago

Hi! :) I changed some of the code to use with Python 3,, however, I have some issues. I cannot find a library with the FP-growth algorithm that works. I tried the pyspark one and the FP-growth. In the pyspark one, I end up with spark's connection errors after some runs. It was working in the beginning, but then it blew up. The second one cannot handle my dataset due to memory problems.

Btw, after I changed some dict problems with iteritems() and has_key(), the nimeFPtree function gives me an error that I do not know what it is:

bigL = [v[0] for v in sorted(headerTable.items(), key=lambda p: p[1])] # (sort header table) AttributeError: 'NoneType' object has no attribute 'items'

Any thoughts?

Thanks in advance

Inger-Chao commented 4 years ago

The error happens because headerTable has None value returned in the createFPtree method,

    for k in list(headerTable.keys()):
        if headerTable[k] < minSup:
            del (headerTable[k])  # 删除不满足最小支持度的元素
    freqItemSet = set(headerTable.keys())  # 满足最小支持度的频繁项集
    if len(freqItemSet) == 0:
        return None, None

the headerTable[k] value was all deleted and finally headerTable return None. The author set the n = 20000 in the demo, maybe it's too big for your dataset, and I decreased the n value to make this demo works at my dataset.

WissenY commented 4 years ago

想请作者解释一下,在支持度计数为100000的情况下,如何在mac上用13秒跑完(你的中文博客如是写道),我将你的代码改为python3.7后,在8代i7,内存16g下也依然跑了十几分钟