With the aim of building next generation virtual assistants that can handle multimodal inputs and perform multimodal actions, we introduce two new datasets (both in the virtual shopping domain), the annotation schema, the core technical tasks, and the baseline models. The code for the baselines and the datasets will be opensourced.
I was checking the validity of the actions generated with mm_action_prediction/tools/extract_actions_fashion.py and I have found that the API extracted does not always match the dialogue. Sometimes there are more actions than turns (and vice-versa) for a dialogue.
Dialogues {321, 3969, 3406, 4847, 3414} have a different number of turns and actions. For instance dialog 3969 has 6 turns but 2 actions only. It seems to me that dialog 3969 in fashion_train_dials.json is badly annotated (everything is inside the belief_state field).
I was checking the validity of the actions generated with
mm_action_prediction/tools/extract_actions_fashion.py
and I have found that the API extracted does not always match the dialogue. Sometimes there are more actions than turns (and vice-versa) for a dialogue.Dialogues
{321, 3969, 3406, 4847, 3414}
have a different number of turns and actions. For instance dialog 3969 has 6 turns but 2 actions only. It seems to me that dialog 3969 infashion_train_dials.json
is badly annotated (everything is inside thebelief_state
field).