Hi, I noticed that in Table 3 of your paper, you've divided the evaluation into several subsets of questions. I'm curious if you have any corresponding scripts to perform this task. The current codebase appears to only report the overall EM@1 and EM@10 metrics.
Hi, I noticed that in Table 3 of your paper, you've divided the evaluation into several subsets of questions. I'm curious if you have any corresponding scripts to perform this task. The current codebase appears to only report the overall EM@1 and EM@10 metrics.