Closed amecxx closed 11 years ago
Thanks for the bug report.
I can confirm that there is a problem, but I don't know what's causing it. Unfortunately, it's been a couple of years since I've worked with FP-growth, and my efforts to debug it have yielded nothing.
If you figure out the issue, let me know – I'll gladly accept a patch.
Using the following data set: a,d,e b,c,d a,c,e a,c,d,e a,e a,c,d b,c a,c,d,e b,c,e a,d,e
and modifying the find_with_suffix method in order to catch the support values for each itemset: yield [found_set, support] instead of yield found_set
if you run the algorithm with a min_support value of 5, we have: [['a'], 7] [['c'], 7] [['e'], 7] [['a', 'e'], 6] [['d'], 6] [['a', 'd'], 7] [['e', 'd'], 6]
Notice that the support value of the itemset {a,d} is 7. This value should be 5.
If you add a new transaction to the initial data set ( {b,c} ): a,d,e b,c,d a,c,e a,c,d,e a,e a,c,d b,c a,c,d,e b,c,e a,d,e b,c ----> The new one
and running the algorithm with the same min_support value of 5, we obtain: [['a'], 7] [['c'], 8] [['e'], 7] [['a', 'e'], 9] [['d'], 6] [['a', 'd'], 5] [['c', 'd'], 6]
Now, the support value for {a,d} is 5 (the correct value).
I'm not sure if I'm overlooking something, but it seems that something is not working properly. Any idea about what could be wrong?
Thanks in advance.