Methodology

First of all, we need to evaluate the overall performance of the system.
Secondly, we need to evaluate the performance of each classifier.
Thirdly, we need to do error analysis for each stage on the pipeline, and adjust the model.
Finally, we iterate the three steps above until a satisfying performance is achieved.

Evaluation Targets

We need to perform evaluation on the following items.

Retrieved items	Unordered retrieval measures	Ordered retrieval measures
concepts mean	percision, recall, F-measure	MAP,GMAP
articles	mean percision, recall, F-measure	MAP,GMAP
triples	mean percision, recall, F-measure	MAP,GMAP

Flat Evaluation

We need to perform evaluations for each classifier. The following measures should be taken.

Precision: P = TPTP+FP
Recall: R = TPTP+FN
F-measure: F = 2P*RP+R
Average Precision: AP = r=1|L|P(r)*rel(r)LR, where |L| is the number of items, and |LR| is the number of relevant items.
Mean Average Precision: MAP = 1ni=1nAPi, for a list of queries: q1, q2, …, qn
Geometric Mean Average Precision: GMAP = ni=1n(APi+e) , e is a small value for smoothing
Hierarchical Evaluation

The classification is hierarchical so flat evaluation measures do not work sufficiently. In the multiple levels of classification, once there is an error in one classifier the final result is incorrect. Flat measures fail for this case because we could not tell the cause of the error by its evaluation. So we need to design a hierarchy of measurements, taking the relations and performance for each classifier into consideration. Kiritchenko et al. proposed a hierarchical precision as:

precision = |An(Cp) An(Ct)||An(Cp)| where Cp is the set of predicted categories, An(Cp) is the set of ancestors of Cp, Ct is the set of true categories and An(Ct) is the set of ancestors of Ct.

For the general evaluation, flat micro-F1 measure will be used.

MiF1 = 2_MiP_MiRMiP+MiR, where MiP and MiR are defined as following.
MiP = i=1tpcii=1(tpci+fpci)
MiR = i=1tpcii=1(tpci+fnci) Also, the F-measure (LCaF), precision (LCaP), and recall (LCaR) will be applied to the lowest common ancestor (LCA) on the hierarchy too.
LCaF = 2_LCaP_LCaRLCaP+LCaR, LCaP = |Yaug Yaug||Yaug|, LCaR = |Yaug Yaug||Yaug|

11693-2 / project-team02

Error Analysis #14

Methodology

Evaluation Targets

Flat Evaluation

Hierarchical Evaluation