Added filtered attributes ground truth for action prediction scorer

facebookresearch / simmc

With the aim of building next generation virtual assistants that can handle multimodal inputs and perform multimodal actions, we introduce two new datasets (both in the virtual shopping domain), the annotation schema, the core technical tasks, and the baseline models. The code for the baselines and the datasets will be opensourced.

Other

131 stars 36 forks source link

Added filtered attributes ground truth for action prediction scorer #22

Closed seo-95 closed 4 years ago

seo-95 commented 4 years ago

The baselines are trained to predict attributes for each action as a multilabel prediction problem where each attribute can assume value in a set of 7 possible outcomes: {"availableSizes", "price", "brand", "customerRating", "info", "color"} + "others".

The variety of attribute values included in the fashion dataset is instead 33. During the training, all the attributes of the training set not included in the desired subset are replaced with the value "others". Anyway, this mechanism is not included in the scorer, resulting in a comparison between 33 possible ground truth values vs only 7 that can be predicted by the model.

This pull request adds the attribute values filtering on the ground truth labels also.

seo-95 commented 4 years ago

The filtering from 33 to 7 attributes is only a design choice of the organizers for the baseline they published. The challenge instead is on the prediction of the whole set of values for the attributes (33).

satwikkottur commented 4 years ago

Adding the clarification from a private thread here (for completion):

Picking 7 out of all the possible 33 attributes is a modeling choice the baselines take due to distribution of data. The rest of the attributes are then mapped to "other" for ease of modeling. However, evaluation should not take into account this relaxation as it would then penalize models that can potentially identify all 33 attributes. Due to this assumption the baselines will take a hit in performance trading off for simplicity. Hence, I did not restrict the evaluation to these 7 choices.