Open tangxiangru opened 5 years ago
or What is the detailed meanings of this four cmd?
main.py eval -> evaluate the architecture on the task itself (e.g., "how accurate is my GRU on the 20 newsgroups test set")
main.py score -> pre-calculate relevance scores
main.py pointinggame -> evaluates an explanation method using the pointing game (c.f., [1], e.g., "how good is the decomposition method at explaining the GRU on the 20 newsgroups test set")
main.py manual -> evaluates all explanation methods on the manually annotated benchmark by [2]
[1] Poerner, N., Roth, B., Schütze, H. (2018). Evaluating neural network explanation methods using hybrid documents and morphosyntactic agreement. ACL.
[2] Mohseni, S., Ragan, E.D. (2018) A Human-Grounded Evaluation Benchmark for Local Explanations of Machine Learning. arXiv preprint arXiv:1801.05075
So I need a pre-calculate relevance score before pointinggame?
Thank you so muck for your help!
main.py eval architecture corpus # evaluate primary model performance on test set main.py score architecture corpus method # pre-calculate relevance scores main.py pointinggame architecture corpus method # evaluate relevance maximum main.py manual architecture> method # evaluate relevance maximum on manual benchmark