facebookresearch / simmc

With the aim of building next generation virtual assistants that can handle multimodal inputs and perform multimodal actions, we introduce two new datasets (both in the virtual shopping domain), the annotation schema, the core technical tasks, and the baseline models. The code for the baselines and the datasets will be opensourced.
Other
131 stars 36 forks source link

Adds valid inputs for each task; links in READMEs #27

Closed satwikkottur closed 4 years ago

cccntu commented 4 years ago

Thanks for the update.

I think there is an error: (prediction target) should be at system_transcript, not transcript.

While we are at it, I have a few questions: We are allowed to use metadata and dialogue_coref_map (at inference time), right? (So we are able to fill in information like dimension and price.) But I think we should not use objects that we haven't seen, but is listed in dialogue_coref_map?

And data from previous rounds: We are implicitly allowed to use system_transcript and transcript from previous rounds, is there anything else we are allowed/not allowed to use?

Similarly, I wonder why we are not allowed to use state_graph_0 for MM-DST (only)? My guess is that it contains information from annotations in previous rounds, and we are not allowed to use them for MM-DST?

Thanks again.

satwikkottur commented 4 years ago

Hello @cccntu,

Thank you for catching the typo - yes the prediction target for Subtask 2 should be system_transcript.

You are allowed to use metadata but not dialogue_coref_map at inference time. Please see this for how you can obtain information like dimension and price. We added this information to the TASK_INPUTS.md file as well.

state_graph_0 has oracle annotations for previous rounds, which trivializes some of the cumulative-state prediction of the MM-DST task - which we hope our participants can tackle in this challenge. Again, participants are free to use any of this information during training, and at the inference time perhaps use the reconstructed / predicted values instead.

All of this information (along with other clarifications) have been added in the recent commit.

Feel free to open an issue if you have further questions!