Closed singleheart closed 2 years ago
The GROUND_TRUTH_PATH
is different.
F1 measures the overlap between the model’s response and the human response from the dataset.
KF1 instead measures the overlap between the model’s response and the knowledge on which the human grounded during dataset collection.
Oh, I've missed the GROUND_TRUTH_PATH
. Thank you.
Actually, I know that F1 and KF1 are different. But
eval_resp_generation.sh
provides the same codes for both.F1: https://github.com/NVIDIA/Megatron-LM/blob/9a8b89acd8f6ba096860170d0e30ddc0bc2bacd4/examples/msdp/eval_resp_generation.sh#L14-L28
KF1: https://github.com/NVIDIA/Megatron-LM/blob/9a8b89acd8f6ba096860170d0e30ddc0bc2bacd4/examples/msdp/eval_resp_generation.sh#L35-L49
It seems that both codes are the same. Is there any script missing?