What is the difference between F1 and KF1 evaluation in eval_resp_generation.sh?

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start

Other

10.42k stars 2.33k forks source link

What is the difference between F1 and KF1 evaluation in eval_resp_generation.sh? #180

Closed singleheart closed 2 years ago

singleheart commented 2 years ago

Actually, I know that F1 and KF1 are different. But eval_resp_generation.sh provides the same codes for both.

F1: https://github.com/NVIDIA/Megatron-LM/blob/9a8b89acd8f6ba096860170d0e30ddc0bc2bacd4/examples/msdp/eval_resp_generation.sh#L14-L28

KF1: https://github.com/NVIDIA/Megatron-LM/blob/9a8b89acd8f6ba096860170d0e30ddc0bc2bacd4/examples/msdp/eval_resp_generation.sh#L35-L49

It seems that both codes are the same. Is there any script missing?

YubinRuan commented 2 years ago

The GROUND_TRUTH_PATH is different. F1 measures the overlap between the model’s response and the human response from the dataset. KF1 instead measures the overlap between the model’s response and the knowledge on which the human grounded during dataset collection.

singleheart commented 2 years ago

Oh, I've missed the GROUND_TRUTH_PATH. Thank you.