amritasaha1812 / CSQA_Code

59 stars 20 forks source link

The process of evaluation #12

Closed Impavidity closed 5 years ago

Impavidity commented 6 years ago

Hey @vardaan123, would you mind explaining the whole evaluation process? I am a little bit confused on the evaluation part.

From the script here

Step16: For evaluating the model separately on each question type, run the following:
./run_test.sh Target_Model_decoder verify
./run_test.sh Target_Model_decoder quantitative_count
./run_test.sh Target_Model_decoder comparative_count
./run_test.sh Target_Model_kvmem simple
./run_test.sh Target_Model_kvmem logical
./run_test.sh Target_Model_kvmem quantitative
./run_test.sh Target_Model_kvmem comparative

I think you feed different type of question to neural network using different decoding method (seq2seq/kvmem). Is this the correct way to evaluate? Assume we get a question, how can we know the type of the question before feeding it into a specific type the decoder ?

Can you explain a little bit about the evaluation process?

BTW, the precision/recall calculation is based on all_entities field or entities_in_utterance field in the SYSTEM JSON response ?

vardaan123 commented 6 years ago

Yes, the decoding method depends on the type of the question. In the input data json, the question type is already specified, i.e. simple, logical, quantitative, comparative etc. P=TP/(TP+FP), R=TP/(TP+FN). We filter the topK entities (k=20) from kvmem decoder and use the active set to calculate precision and recall. 'entities_in_utterance' is used for computing precision and the active set is used to retrieve all_entities for calculating recall.

Impavidity commented 6 years ago

There are some inconsistencies between all_entities and entities_in_utterance in SYSTEM response. Because sometimes you can not list all the entities in utterance. So in this case, which one do you use to calculate the precision and recall

Do you have output example that I can play with ?

vardaan123 commented 5 years ago

Sorry for the very late response. From the doc. https://amritasaha1812.github.io/CSQA/_pages/example.html, all_entities corresponds to the answer while entities_in_utterance corresponds to the question utterance. So, in this case all_entities shall correspond to "true positives". The (TP+FP) is given by ALL predicted entities which equals the top K entities given by kvmem decoder. The (TP+FN) is given by all entities in active_set. Now there are 3 equations and 4 unknowns but since both P and R are ratios, they can be calculated