Open zd31 opened 2 months ago
Hello zd31, thanks for your interest in my module. Indeed we are slightly deviating from Silla & Freitas 2011, but I think you made an important observation here. I will need to double check my code if there is an error with it, or the definitions lead to FP=FN.
In the example of GermEval my code definitely outputs FP unequal to FN sometimes: https://github.com/DerKevinRiehl/HierarchicalConfusionMatrix/blob/main/JupyterNotebooks/Example_GermEval2019_Task1A.ipynb
In the meanwhile, could you share how exactly you use the library (your code calling our methods)? And maybe an excerpt of the txt file where you store your predictions?
That would be helpful to better understand your situation.
Thanks and looking forward to your answer, Best, Kevin
Hi Kevin,
Thank you for fast reply!
I thought maybe it is because I balanced the dataset before training, but I have FP=FN even if I use balanced data.
This file has true labels and predicted labels: evalLabel_data.csv
F1 PPV REC ACC MCC TP TN FP FN
0.6028 0.6028 0.6028 0.8298 0.4945 29554 160341 19474 19474
Sorry, my codes are still raw and messy, hope you can easily read them.
# Add root to the predictions
def add_root(array):
# Create a new column with 'root'
root = np.full((array.shape[0], 1), 'root')
# Concatenate the new column to the original array
new_array = np.hstack((root, array))
return new_array
# Clean unnecessary separators
def clean_edges(array):
def extract_last_part(s):
return s.split("::")[-1] if "::" in s else s
for row in array:
row[0] = extract_last_part(row[0])
row[1] = extract_last_part(row[1])
return array
# Extract hierarchy from LCN model
def extract_hierarchy(model):
hierarchy = model.hierarchy_
edges = np.array(hierarchy.edges)
cleaned_edges = clean_edges(edges)
graph = nx.DiGraph()
graph.add_edges_from(cleaned_edges)
return graph
# Calcualte evalution metrics
def eva_metrics(confusion_m, metrics):
TP = confusion_m[0]
TN = confusion_m[1]
FP = confusion_m[2]
FN = confusion_m[3]
tpfp = np.float64(TP + FP)
tpfn = np.float64(TP + FN)
tnfp = np.float64(TN + FP)
tnfn = np.float64(TN + FN)
F1 = 2 * TP / (2 * TP + FP + FN)
PPV = TP / (TP + FP)
REC = TP / (TP + FN)
ACC = (TP + TN) / (TP + TN + FP + FN)
MCC = ((TP * TN) - (FP * FN)) / np.sqrt(float(tpfp*tpfn*tnfp*tnfn))
if metrics == "f1":
return F1
elif metrics == "ppv":
return PPV
elif metrics == "rec":
return REC
elif metrics == "acc":
return ACC
elif metrics == "mcc":
return mcc
else:
return F1, PPV, REC, ACC, MCC
def h_confusion_matrix(model, metrics):
pred_y = model.predict(t_test_X)
full_pred_y = add_root(pred_y)
graph = extract_hierarchy(model)
evalLabel_data = {}
for i in range(len(test_y)):
evalLabel_data[i] = {"true": [np.array(test_y)[i]],
"pred": [full_pred_y[i].tolist()]}
h_confusion = {}
h_confusion_total = []
for key in evalLabel_data:
try:
h_confusion[key] = determineHierarchicalConfusionMatrix(graph, evalLabel_data[key]["true"], evalLabel_data[key]["pred"])
h_confusion_total.append(h_confusion[key])
except KeyError as e:
print(f"Skipping key {key} due to missing data: {e}")
print(evalLabel_data[key])
except Exception as e:
print(f"Skipping key {key} due to an error: {e}")
print(evalLabel_data[key])
h_confusion_total = np.sum(np.asarray(h_confusion_total),axis=0)
if metrics == "all":
F1, PPV, REC, ACC, MCC = eva_metrics(h_confusion_total, "all")
print("{:<10} {:<10} {:<10} {:<10} {:<10} {:<10} {:<10} {:<10} {:<10}".format(
"F1", "PPV", "REC", "ACC", "MCC", "TP", "TN", "FP", "FN"))
print("{:<10.4f} {:<10.4f} {:<10.4f} {:<10.4f} {:<10.4f} {:<10} {:<10} {:<10} {:<10}".format(
F1, PPV, REC, ACC, MCC, h_confusion_total[0], h_confusion_total[1], h_confusion_total[2], h_confusion_total[3]))
else:
return eva_metrics(h_confusion_total, metrics)
Hi zd31, I think your code looks pretty logical to me.
I have a suggestion for you to try: In the function def h_confusion_matrix(model, metrics): try to print for each key in evalLabel_data the confusion matrix (so not only regard the final h_confusion_total which is a sum of the many single confusion matrix), and see if you observe a weird pattern here as well, e.g. that FP is always FN.
Another reason I could think of is not your model, but the taxonomy/hierarchy you use. Could you post a diagram or share more details on that? Depending on the hierarchy it could in theory be possible that FP is always FN.
In our work we motivate an alternative definition of TP, FP, TN, FN to Sillas and Freyas, as we are convinced that our concept better reflects the peculiarities of hierarchical classification.
Looking forward to your answers, Best, Kevin
Hi Kevin,
Thanks for your suggestion!
I have checked the confusion matrix for each key and found FP = FN always.
I am predicting GICS codes and you can see the link for detail: [https://www.msci.com/our-solutions/indexes/gics]. Each group has a unique serial code. It is a single path and full depth labelling problem.
I also tried other packages to evaluate my model, and it turned out precision=recall=f1 as well. So I think you are right, maybe the hierarchy is the problem.
Another problem I met is that my LCN model does not significantly outperform the flat model. I see few studies report similar situation. Does this mean there is a problem about this hierarchy?
Thanks for your package and your patience for my problem!
Best, zd31
Dear zd31, I think the hierarchz indeed looks pretty "symmetric" on my first view.
It is not a "problem", it is just part of your analysis that is just like that.
I guess if you manage to improve your LCN model significantly (if possible) you might see better results.
If you have another question about evaluation of your hierarchical classification model, let me know. :-)
Best, Kevin
Hi Kevin,
Thank you for your help! Your package is really helpful!
I hope you don't mind me asking about this "symmetric" thing. I tried to find it in literature, but I did not find any trace of it at least nothing useful. Is there any literature that you recommand?
Thanks for your help again!
Best, zd31
Hey zd31, haha no, dont expect me to use fancy words from the literature^^
I meant symmetric, because it looks like each branch has the same number of subclasses, and subsubclasses. Nothing more^^
I dont have specific literature to recommend. Development of classification algorithms highly depend on the available data and domain. I think you should try to review existing classifier algorithms and what features they extract from data from classification, and then think of combining them or use models that can capture higher complexity.
Good luck, Best, Kevin
Hi,
I am using your hierarchical confusion to evaluate output from local classifier per node (LCN) model in hiclass library.
The problem is I always have FP=FN no matter how I change my model. I went through your paper, but I did not find anything about training policies in LCN approach. Your definitions on positve and negative samples seem different from those in silla & freitas (2011).
I am new to machine learning and hierarchical classification. Maybe I made some mistakes.