Closed WEIYanbin1999 closed 6 months ago
Hi, sorry for the late reply, because I’m quite busy in the past few weeks.
That is a good question. Hits@k and AUC are indeed commonly utilized in link prediction tasks. Nonetheless classification metrics, e.g. F1 score/precision/accuracy, are also commonly adopted in literature on evaluation of link prediction. Here we opted for accuracy as our evaluation metric for convenience, because we mainly aim to evaluate our robust and flexible alignment between graph and token spaces.
To evaluate LLaGA with those ranking metrics, you may combine LLaGA with any one of the llm for ranking techniques, e.g. [1]. I just implemented and updated an evaluation script with the simplest pointwise ranking approach, which involves requiring the model to output a straightforward “yes” or “no” response, and then ranking all samples based on the logit values of the “yes/no” in the initial generated token. You can use eval/eval_pretrain_logit.py instead of eval/eval_pretrain.py to do evaluation on link prediction task and then replace the --task argument for eval/eval_res.py from 'lp' to 'lprank' to get the results.
I did a simple test on Arxiv and the results are as follow, LLaGA still show great performance using these ranking metrics.
Model | Auc | Hit@100 |
---|---|---|
GCN | 97.41 | 17.31 |
GraphSage | 96.95 | 19.81 |
LLaGA-ND | 97.56 | 37.65 |
LLaGA-HO | 98.51 | 45.00 |
[1] A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models
Dear Authors, For Link Prediction, what is the consideration for you to choose Accuracy but not AUC, which is common in the Link Prediction setting? Or rank metrics like Hit@K, and MRR, which are also used more than accuracy? Thanks