czyssrs / ConvFinQA

Data and code for EMNLP 2022 paper "ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering"
MIT License
74 stars 16 forks source link

KeyError: 'gold_ind' in the test_turn.json #1

Open LxYuan-Handshakes opened 1 year ago

LxYuan-Handshakes commented 1 year ago

Hi,

Currently I stuck at the retriever inference part. Got some errors because it looks for the gold_ind key in the test_turn.json data file but there is no gold_ind in the test data right?

The following is the error message:

(env) [~finqanet_retriever]$ python Test.py
71 211

Reading: operation_list.txt
Reading: constant_list.txt
Reading /home/playground/ConvFinQA/data/test_turn.json
Traceback (most recent call last):
  File "/home/playground/ConvFinQA/code/finqanet_retriever/Test.py", line 52, in <module>
    read_examples(input_path=conf.test_file, tokenizer=tokenizer,
  File "/home/playground/ConvFinQA/code/finqanet_retriever/utils.py", line 172, in read_examples
    examples.append(finqa_utils.read_mathqa_entry(entry, tokenizer))
  File "/home/playground/ConvFinQA/code/finqanet_retriever/finqa_utils.py", line 356, in read_mathqa_entry
    all_positive = entry["annotation"]["gold_ind"]
KeyError: 'gold_ind'

Additional information

Also, i had to commented out the part that loads the test data in the Main.py training script so I could start training it.

--- a/code/finqanet_retriever/Main.py
+++ b/code/finqanet_retriever/Main.py
@@ -66,9 +66,9 @@ valid_data, valid_examples, op_list, const_list = \
     read_examples(input_path=conf.valid_file, tokenizer=tokenizer,
                   op_list=op_list, const_list=const_list, log_file=log_file)

-test_data, test_examples, op_list, const_list = \
-    read_examples(input_path=conf.test_file, tokenizer=tokenizer,
-                  op_list=op_list, const_list=const_list, log_file=log_file)
+#test_data, test_examples, op_list, const_list = \
+#    read_examples(input_path=conf.test_file, tokenizer=tokenizer,
+#                  op_list=op_list, const_list=const_list, log_file=log_file)

 kwargs = {"examples": train_examples,
           "tokenizer": tokenizer,

The reason is that, in code/finqanet_retriever/finqa_utils.py, it looks for the gold_ind key in the given data file but the test json file doesnt have the key.

def read_mathqa_entry(entry, tokenizer):

    filename_id = entry["id"]
    question = " ".join(entry["annotation"]["cur_dial"])
    all_positive = entry["annotation"]["gold_ind"]
GasolSun36 commented 1 year ago

Hi. The reason for missing 'gold_ind' in test set is that this is a private test set and you can only submit your results on leaderboard to get a result. Check FinQA code for more information. There is a solution for handling of private data, and that works for ConvFinQA.