Question about test-std files

facebookresearch / simmc

With the aim of building next generation virtual assistants that can handle multimodal inputs and perform multimodal actions, we introduce two new datasets (both in the virtual shopping domain), the annotation schema, the core technical tasks, and the baseline models. The code for the baselines and the datasets will be opensourced.

Other

131 stars 36 forks source link

Question about test-std files #42

Closed tsehsuan1102 closed 4 years ago

tsehsuan1102 commented 4 years ago

Hi, I am curious about the test-std files will be released on Sept 28. Will you provide retrieval-candidate files and api call files? Or just only provide one JSON file. ( _{domain}_devtest_dials_teststd_formatpublic.json)

Since we want to use the action information which is allowed at inference time mentioned in TASKINPUTS.md. However, there is no annotated information in {furniture/fashion}_devtest_dials_teststd_formatpublic.json. How can we get the ground truth action?

We are using the preprocessing script in mm_actionpredicition, but it is not able to process {furniture/fashion}_devtest_dials_teststd_formatpublic.json. Can you provide full test files before Sept. 28 that we can directly run on the baseline model?

tsehsuan1102 commented 4 years ago

seo-95 commented 4 years ago

Same issue here. Ground truth action is missing from the test-std draft file. Additionally, I can't even find the item focus id for each turn, only partial information of the item is reported inside visual_objects and sometimes they are not sufficient.

                {
                    "domain": "fashion", 
                    "visual_objects": {
                        "OBJECT_2": {
                            "hemLength": [
                                "mini", 
                                "knee_length"
                            ], 
                            "pattern": [
                                "chevron", 
                                "animal"
                            ], 
                            "pos": "focus", 
                            "skirtStyle": [
                                "peplum", 
                                "a_line", 
                                "body_con", 
                                "loose", 
                                "fit_and_flare"
                            ], 
                            "embellishments": [
                                "pleated"
                            ], 
                            "type": "skirt"
                        }
                    }, 
                    "system_transcript": "Here is the skirt from Pedals & Gears. It retails for $124 and is rated at 3.96.", 
                    "turn_idx": 1, 
                    "belief_state": {}, 
                    "transcript": "sure"
                }

shanemoon commented 4 years ago

Hi all, sorry that this info was not included, we will look into this and release a new file soon.

Please do note though that we are providing {domain}_devtest_dials_teststd_format_public.json just as a guide before we release the future test-std set, and the results for Phase 1 should be reported on the {domain}_devtest_dials.json, released earlier.

billkunghappy commented 4 years ago

Hi, we're curious about the test-std file format. Are there gonna be retrieval_candidates files and files contain API information for test-std? Since the team model entry deadline is getting closer, without the detailed format of the test-std files, we are afraid that our code may not be able to directly run on the test-std files( Which is able to directly run on the devtest set).

Are we allowed to modify preprocess code after Sep.28 to run test-std ?(Only change some details to be able to successfully run it, without changing the models).

satwikkottur commented 4 years ago

Hello all, sorry for not including this information before.

For the test-std split, we will also release the corresponding API calls and retrieval candidates (public versions, excludes the last round on which evaluation is done) in the format of corresponding existing files.

In order to check for compatibility, we will now release devtest API calls and retrieval candidates in this format.

billkunghappy commented 4 years ago

Hello, there are still 2 questions about the submission. First, for the Challenge Phase1, we should submit our devtest prediction files. In the Readme of the simmc Repo, it mentioned us to follow the instructions in the Submission instruction. However there is no instruction about the submission of the devtest in Submission instruction. Should we email you the results of devtest? If so, which email address and what's the submission format?

Secondly, In the previous comment, @satwikkottur has said For the test-std split, we will also release the corresponding API calls and retrieval candidates (public versions, excludes the last round on which evaluation is done) in the format of corresponding existing files. What does the excludes the last round on which evaluation is done means? I thought it means to exclude the last round API in each dialogue in the api calls file(exclude the last round of retrieval candidates is kind of weird, since we need those candidates to predict retrieval candidates scores) But when I check into fashion_devtest_dials_api_calls_teststd_format_public.json and furniture_devtest_dials_api_calls_teststd_format_public.json, both files do include the last round API information correspond to the fashion_devtest_dials_teststd_format_public.json and furniture_devtest_dials_teststd_format_public.json

satwikkottur commented 4 years ago

Hello @billkunghappy,

Thanks for raising this concern.

Regarding the comment, your understanding is correct. (a) In the API calls file, the last round on which evaluation is to be performed will be excluded. (b) For retrieval candidates, the last round will contain the retrieval candidates but will not contain the gt_index field that gives the index of the ground truth response.

With respect to the files, I just realized that the suffixes public and private got switched for API calls. While I will fix these later, please take a look at the wrongly named furniture_devtest_dials_api_calls_teststd_format_private.json (should be public) and corresponding file for fashion.

Hope this helps!

seo-95 commented 4 years ago

Hi, I have 2 questions raised from the test-std dataset release regarding the response_generation task.

Since the action and attributes annotations are not available for the current turn (differently from what was defined in TASK_INPUTS.md), are we able to slightly modify the code (not the model) to avoid using this information?
Whenever we encounter a potential SearchMemory or SearchDatabase in the k-th turn (the one on which the generation and the action prediction are evaluated) we do not have the annotation about the new focus item (during the first phase of the challenge it was included in the dials_api JSON file). Since the response of the wizard is conditioned on the item she/he is looking at, how can we generate such a response if we do not have information about the item? An example here below (dialogue 1902):
```
            {
                "domain": "fashion", 
                "visual_objects": {}, 
                "system_transcript": "", 
                "turn_idx": 2, 
                "belief_state": {}, 
                "transcript": "Show me another coat, but one that 212 Localts more."
            }
```

satwikkottur commented 4 years ago

Hello @seo-95 ,

Thanks for raising these two important concerns. We've updated the API calls file to include these two images for the last turn on which evaluation is performed. Of course, you're not allowed to use the ground truth API calls for subtask-1 but can use these for subtask-2 (as per the table).

Apologies for the confusion earlier, hope this addresses your concerns.