Incorrect evaluation script provided for MM-DST baseline

facebookresearch / simmc

With the aim of building next generation virtual assistants that can handle multimodal inputs and perform multimodal actions, we introduce two new datasets (both in the virtual shopping domain), the annotation schema, the core technical tasks, and the baseline models. The code for the baselines and the datasets will be opensourced.

Other

131 stars 36 forks source link

There is a bug located in the parse_flattened_result function in the "gpt2_dst/utils/convert.py" file. Please look at the following code:

def parse_flattened_result(to_parse):
    ....
    d = {}
    for dialog_act in dialog_act_regex.finditer(to_parse):
        d['act'] = dialog_act.group(1)
        d['slots'] = []
        ....

        if d != {}:
            belief.append(d)  # Not re-initialized during the for-loop.

The belief variable is appending the reference of dictionary variable d rather than the copy of variable d which would cause the variable belief always adding the same action and slots. The fix is to put the d={} inside for-loop. It would impact the baseline performance for SubTask 3. (The actual performance would be lower after fixing this script)

facebookresearch / simmc

Incorrect evaluation script provided for MM-DST baseline #12