Question about the new evaluation method for Task 1&2

Hi. I've noticed that attribute accuracy for action prediction is very low for fashion baseline model I know that there is a new parameter, single_round_eval added to the updated evaluation script for task 1 and 2(mm_action_prediction). if single_round_eval and round_id != num_gt_rounds - 1: continue And when single_round_eval is True, it only evaluates the last round for each dialog. but most last round of every dialog's API is "None" or "AddToCart", which does not have any attributes so it leaves supervision to be None most of the time. `supervision = gt_datum["action_supervision"]

        if supervision is not None and "args" in supervision:
            supervision = supervision["args"]
        if supervision is None:
            skipped += 1
            continue`

I've counted the number of times the evaluation skips because supervision is None and it was 973 times for fashion domain for dev test. Hence, for fashion domain, only 982-973 rounds are being evaluated. I believe this is the reason why attribute_accuracy is very low for the updated evaluation script. I want to check if this is how it is supposed to be or it needs to be fixed.

facebookresearch / simmc

Question about the new evaluation method for Task 1&2 #43