facebookresearch / simmc

With the aim of building next generation virtual assistants that can handle multimodal inputs and perform multimodal actions, we introduce two new datasets (both in the virtual shopping domain), the annotation schema, the core technical tasks, and the baseline models. The code for the baselines and the datasets will be opensourced.
Other
131 stars 36 forks source link

Are we allowed to use "turn_label" fields for subtasks 1-2 ? #26

Closed seo-95 closed 4 years ago

seo-95 commented 4 years ago

In the first turn of dialogue 4146 in fashion-dev dataset the user asks to compare the price of the current object (present in visual_objects) with the price of the previously seen object. The only 2 annotations about the existence of a previous object are present in "state_graph_2", which is not allowed as input, and in the "objects" subfield of "turn_label". Are we allowed to use "turn_label" as input for action_prediction and response_generation?

satwikkottur commented 4 years ago

For fashion, you're allowed to use the objects in the memory tracked by ids in the corresponding task_id for a dialog. Does that address your concern?

seo-95 commented 4 years ago

For fashion, you're allowed to use the objects in the memory tracked by ids in the corresponding task_id for a dialog. Does that address your concern?

Task id 1807 has empty memory images list. Is this simply an annotation error?

satwikkottur commented 4 years ago

SIMMC-Fashion has a combination of dialogs with and without memory images (empty list).

seo-95 commented 4 years ago

SIMMC-Fashion has a combination of dialogs with and without memory images (empty list).

Thank you for the clarification. My question was on dialogue 4146 of fashion-dev that contains, in the first turn, the reference to an already seen object but the images in memory for that particular task id are zero. I wanted to ask you if this is simply an annotation error or if there is something I am still missing.

Regarding the visual context we are discussing, I have one more concern born from this conversation. Are we allowed to use the "visual_objects" field during training or we can only to refer to database, focus, and memory images fields for each turn?

shubhamagarwal92 commented 4 years ago

@satwikkottur @shanemoon

On the same note, can we use system_transcript_annotated? For eg: to generate the response:

System: This dress from Downtown Stylists is available in ivory and beige.

Can we use this DA information?

System Annotations: [{'intent': 'DA:INFORM:GET:DRESS.color', 'slots': [{'id': 'O.brand', 'span': {'start': 16, 'end': 19}, 'text': 'Downtown Stylists', 'subframe': {'utterance': 'Downtown Stylists', 'domain': '', 'intent': '.name', 'slots': [], 'span': {'start': 0, 'end': 3}, 'node_id': 9}, 'node_id': 6}, {'id': 'O.color', 'span': {'start': 36, 'end': 41}, 'text': 'ivory', 'node_id': 7}], 'span': {'start': 0, 'end': 41}, 'node_id': 4}, {'intent': 'DA:INFORM:GET:DRESS.color', 'slots': [{'id': 'O.color', 'span': {'start': 4, 'end': 9}, 'text': 'beige', 'node_id': 8}], 'span': {'start': 0, 'end': 10}, 'node_id': 5}]

otherwise, how could we provide the product information to the system?

seo-95 commented 4 years ago

I think that there is a certain amount of confusion regarding the challenge due to the presence of a lot of different annotations for each turn. These annotations are a great source for a lot of interesting experiments in the future months and years (I hope) but in the context of the DSTC9 they can create confusion about the allowed input and, sometimes, about the required output. If I can I would suggest you release a sort of structural draft of the test set (the one that will be released at the end of September). This draft could contain not the data but the fields that will be present in the test JSON file in order to make clear what our models can rely on for the input. For instance, it is not clear if, for subtask#1, our model will have access to the list of the actions previously performed by the wizard in previous turns for that dialogue and this makes it difficult for us to understand if the model is valid or not for this challenge.

I want to thank you for the incredible work you are doing for this challenge and the great support you give us.

cccntu commented 4 years ago

In the README for each subtask, it says

Disallowed Input: belief_state, system_transcript, system_transcript_annotated, state_graph_1, state_graph_2

But 'system_turn_label', 'transcript_annotated', 'turn_label', 'raw_assistant_keystrokes' all seems to contain answer.

I think this is an error in README, can someone clarify this?

satwikkottur commented 4 years ago

Hello all,

Thanks for the discussions. We added a summary of allowed inputs for each task here.

@seo-95 : Dialog 4146 seems to be an annotation error, hopefully one-off. Please let us know if you encounter any more. Also, visual_objects describes the set of objects seen in the current turn. You’re welcome to use these during training and inference.

@shubhamagarwal92: system_transcript_annotated contains information about the system response. This field cannot be used for any of the tasks for inference during that particular turn as mentioned in the disallowed fields in README.md. However, you’re welcome to use it during training.

The entire catalog information is stored in either fashion_metadata.json or furniture_metadata.csv. The API calls provide the state of the carousel (furniture) or focus item (fashion) after the ground truth API / actions have been called. By using these two, one should be able to retrieve the entire information about the catalog items that are potentially described in the system response.