evandempsey / fp-growth

Python implementation of the Frequent Pattern Growth algorithm
ISC License
136 stars 55 forks source link

the whole algorithm is totally a mistake! #15

Open TtCWH opened 5 years ago

TtCWH commented 5 years ago

totally!

Rbain2 commented 5 years ago

I don't agree. I have used this module with the modifications that I mention to avoid rule overwrite and it is very fast and accurate allowing a high degree of granularity in generating association rules even in very large databases. . @evandempsey I propose to fork this module to include the modifications that I suggested re rule overwrite. Have you any comment on this?

TtCWH commented 5 years ago

I don't agree. I have used this module with the modifications that I mention to avoid rule overwrite and it is very fast and accurate allowing a high degree of granularity in generating association rules even in very large databases. . @evandempsey I propose to fork this module to include the modifications that I suggested re rule overwrite. Have you any comment on this?

I have tried some different versions of FP-growth algorithm but fail to find a valid module , could u share your version with me?Thank u very much!

Rbain2 commented 5 years ago

OK My code has transitioned a lot from the original algorithm so took me a while to get back to understand exactly what I am currently using. I did start out using Evan's algorithm. but am now in fact using the frequent item module found at the link below and have combined this with Evan's association rules module modified as per below to avoid rule overwrite.

https://github.com/vukk/amdm-fpgrowth-python/blob/master/fpgrowth.py

Modified association rule module below

def generate_association_rules(patterns, confidence): """ Given a set of frequent itemsets, return a dict of association rules in the form {(left): ((right), confidence)} """ rules = {} for itemset in patterns.keys():

print "itemset in patterns.keys",itemset,"patterns[itemset]",patterns[itemset]

    upper_support = patterns[itemset]

    for i in range(1, len(itemset)):
        for antecedent in itertools.combinations(itemset, i):
            antecedent = tuple(sorted(antecedent))
            consequent = tuple(sorted(set(itemset) - set(antecedent)))

            if antecedent in patterns:

                lower_support = patterns[antecedent]
                confidence = float(upper_support) / lower_support

                if confidence >= confidence_threshold:

                        rule1 = (consequent, confidence)
                        rule1 = list(rule1)
                        if antecedent in rules:

                            rules[antecedent].append(rule1)
                        else:
                            rules[antecedent] = rule1

return rules