What is the difference between this four cmd and main.py train

tangxiangru commented 5 years ago

main.py eval architecture corpus # evaluate primary model performance on test set main.py score architecture corpus method # pre-calculate relevance scores main.py pointinggame architecture corpus method # evaluate relevance maximum main.py manual architecture> method # evaluate relevance maximum on manual benchmark

tangxiangru commented 5 years ago

or What is the detailed meanings of this four cmd?

NPoe commented 5 years ago

main.py eval -> evaluate the architecture on the task itself (e.g., "how accurate is my GRU on the 20 newsgroups test set")

main.py score -> pre-calculate relevance scores

main.py pointinggame -> evaluates an explanation method using the pointing game (c.f., [1], e.g., "how good is the decomposition method at explaining the GRU on the 20 newsgroups test set")

main.py manual -> evaluates all explanation methods on the manually annotated benchmark by [2]

[1] Poerner, N., Roth, B., Schütze, H. (2018). Evaluating neural network explanation methods using hybrid documents and morphosyntactic agreement. ACL.

[2] Mohseni, S., Ragan, E.D. (2018) A Human-Grounded Evaluation Benchmark for Local Explanations of Machine Learning. arXiv preprint arXiv:1801.05075

tangxiangru commented 5 years ago

So I need a pre-calculate relevance score before pointinggame?

tangxiangru commented 5 years ago

Thank you so muck for your help!

NPoe / neural-nlp-explanation-experiment

What is the difference between this four cmd and main.py train #6