Problem: CLEval is not always returning the same output results for the same input. More particularly, the values char_false_pos, and recognition_score seem to change.
How to reproduce: When calling python script.py -g=<your_gt_file> -s=<your_res_file> --E2E multiple times (~10), the results for char_false_pos and recognition_score will not always be the same.
Explanation: The function accumulate_stats() in script.py is supposed to increment (+=) all end2end variables with results from the latest sample. However, the variables self.e2e_char_false_positive, self.e2e_recog_score_chars, and self.e2e_recog_score_correct_num are not incremented. Instead, their value is fixed (=) to the results from the latest sample. As a consequence, the values of those 3 variables will always be equivalent to the values of the latest sample. Hence, a different order in the samples will cause different end2end results.
Solution: Change the function accumulate_stats() in script.py to increment (+=) the variables self.e2e_char_false_positive, self.e2e_recog_score_chars, and self.e2e_recog_score_correct_num based on the results from the last sample.
Problem: CLEval is not always returning the same output results for the same input. More particularly, the values
char_false_pos
, andrecognition_score
seem to change.How to reproduce: When calling
python script.py -g=<your_gt_file> -s=<your_res_file> --E2E
multiple times (~10), the results forchar_false_pos
andrecognition_score
will not always be the same.Output 1:
Output 2 (for exact same
<gt_file>
and<res_file>
):Explanation: The function
accumulate_stats()
inscript.py
is supposed to increment (+=) all end2end variables with results from the latest sample. However, the variablesself.e2e_char_false_positive
,self.e2e_recog_score_chars
, andself.e2e_recog_score_correct_num
are not incremented. Instead, their value is fixed (=) to the results from the latest sample. As a consequence, the values of those 3 variables will always be equivalent to the values of the latest sample. Hence, a different order in the samples will cause different end2end results.Solution: Change the function
accumulate_stats()
inscript.py
to increment (+=) the variablesself.e2e_char_false_positive
,self.e2e_recog_score_chars
, andself.e2e_recog_score_correct_num
based on the results from the last sample.