enaeseth / python-fp-growth

An implementation of the FP-growth algorithm in pure Python.
366 stars 194 forks source link

Problem with the support values. #2

Closed amecxx closed 11 years ago

amecxx commented 13 years ago

Using the following data set: a,d,e b,c,d a,c,e a,c,d,e a,e a,c,d b,c a,c,d,e b,c,e a,d,e

and modifying the find_with_suffix method in order to catch the support values for each itemset: yield [found_set, support] instead of yield found_set

if you run the algorithm with a min_support value of 5, we have: [['a'], 7] [['c'], 7] [['e'], 7] [['a', 'e'], 6] [['d'], 6] [['a', 'd'], 7] [['e', 'd'], 6]

Notice that the support value of the itemset {a,d} is 7. This value should be 5.

If you add a new transaction to the initial data set ( {b,c} ): a,d,e b,c,d a,c,e a,c,d,e a,e a,c,d b,c a,c,d,e b,c,e a,d,e b,c ----> The new one

and running the algorithm with the same min_support value of 5, we obtain: [['a'], 7] [['c'], 8] [['e'], 7] [['a', 'e'], 9] [['d'], 6] [['a', 'd'], 5] [['c', 'd'], 6]

Now, the support value for {a,d} is 5 (the correct value).

I'm not sure if I'm overlooking something, but it seems that something is not working properly. Any idea about what could be wrong?

Thanks in advance.

enaeseth commented 13 years ago

Thanks for the bug report.

I can confirm that there is a problem, but I don't know what's causing it. Unfortunately, it's been a couple of years since I've worked with FP-growth, and my efforts to debug it have yielded nothing.

If you figure out the issue, let me know – I'll gladly accept a patch.