apetkau / comp7944-project

Project on visualizing association rules extracted from covid-19 data.
Apache License 2.0
1 stars 1 forks source link

Apriori using mlxtend on symptom dataset #1

Open walid-shaiket opened 4 years ago

walid-shaiket commented 4 years ago

import numpy as np import matplotlib.pyplot as plt import pandas as pd from mlxtend.frequent_patterns import apriori, association_rules

df= pd.read_csv('D:/MSc_UofM/Data Mining/Project/DataSets/symptoms.tsv',delimiter="\t", header=None) df.head()

show unique symptoms

items = (df[0].unique())

pre-proess to apply apriori

encoded_vals = [] for index, row in df.iterrows(): labels = {} uncommons = list(set(items) - set(row)) commons = list(set(items).intersection(row)) for uc in uncommons: labels[uc] = 0 for com in commons: labels[com] = 1 encoded_vals.append(labels) encoded_vals[0] ohe_df = pd.DataFrame(encoded_vals)

apply apriori with MinSup 0.0045

freq_items = apriori(ohe_df, min_support=0.0045, use_colnames=True, verbose=1)

find association rules

rules = association_rules(freq_items, metric="confidence", min_threshold=0.2)

plot supprort Vs Confidence

plt.scatter(rules['support'], rules['confidence'], alpha=0.5) plt.xlabel('support') plt.ylabel('confidence') plt.title('Support vs Confidence') plt.show()

plot supprort Vs Lift

plt.scatter(rules['support'], rules['lift'], alpha=0.5) plt.xlabel('support') plt.ylabel('lift') plt.title('Support vs Lift') plt.show()

plot lift Vs Confidence

fit = np.polyfit(rules['lift'], rules['confidence'], 1) fit_fn = np.poly1d(fit) plt.plot(rules['lift'], rules['confidence'], 'yo', rules['lift'], fit_fn(rules['lift']))

Frequest Item sets

Frequent_itemsets

Generated Rules

generated-rules

Support Vs Confidence

support_vs_confidence

Support Vs lift

support_vs_lift

lift Vs Confidence

lift_vs_confidence

walid-shaiket commented 4 years ago

Implementation of Apriori using 'mlxtend' Library and all outputs