Closed tsehsuan1102 closed 4 years ago
Same issue here. Ground truth action is missing from the test-std
draft file. Additionally, I can't even find the item focus id for each turn, only partial information of the item is reported inside visual_objects
and sometimes they are not sufficient.
{
"domain": "fashion",
"visual_objects": {
"OBJECT_2": {
"hemLength": [
"mini",
"knee_length"
],
"pattern": [
"chevron",
"animal"
],
"pos": "focus",
"skirtStyle": [
"peplum",
"a_line",
"body_con",
"loose",
"fit_and_flare"
],
"embellishments": [
"pleated"
],
"type": "skirt"
}
},
"system_transcript": "Here is the skirt from Pedals & Gears. It retails for $124 and is rated at 3.96.",
"turn_idx": 1,
"belief_state": {},
"transcript": "sure"
}
Hi all, sorry that this info was not included, we will look into this and release a new file soon.
Please do note though that we are providing {domain}_devtest_dials_teststd_format_public.json just as a guide before we release the future test-std set, and the results for Phase 1 should be reported on the {domain}_devtest_dials.json, released earlier.
Hi, we're curious about the test-std file format. Are there gonna be retrieval_candidates files and files contain API information for test-std? Since the team model entry deadline is getting closer, without the detailed format of the test-std files, we are afraid that our code may not be able to directly run on the test-std files( Which is able to directly run on the devtest set).
Are we allowed to modify preprocess code after Sep.28 to run test-std ?(Only change some details to be able to successfully run it, without changing the models).
Hello all, sorry for not including this information before.
For the test-std
split, we will also release the corresponding API calls and retrieval candidates (public versions, excludes the last round on which evaluation is done) in the format of corresponding existing files.
In order to check for compatibility, we will now release devtest
API calls and retrieval candidates in this format.
Hello, there are still 2 questions about the submission. First, for the Challenge Phase1, we should submit our devtest prediction files. In the Readme of the simmc Repo, it mentioned us to follow the instructions in the Submission instruction. However there is no instruction about the submission of the devtest in Submission instruction. Should we email you the results of devtest? If so, which email address and what's the submission format?
Secondly, In the previous comment, @satwikkottur has said
For the test-std split, we will also release the corresponding API calls and retrieval candidates (public versions, excludes the last round on which evaluation is done) in the format of corresponding existing files.
What does the excludes the last round on which evaluation is done means?
I thought it means to exclude the last round API in each dialogue in the api calls file(exclude the last round of retrieval candidates is kind of weird, since we need those candidates to predict retrieval candidates scores)
But when I check into fashion_devtest_dials_api_calls_teststd_format_public.json and furniture_devtest_dials_api_calls_teststd_format_public.json, both files do include the last round API information correspond to the fashion_devtest_dials_teststd_format_public.json and furniture_devtest_dials_teststd_format_public.json
Hello @billkunghappy,
Thanks for raising this concern.
Regarding the comment, your understanding is correct.
(a) In the API calls file, the last round on which evaluation is to be performed will be excluded.
(b) For retrieval candidates, the last round will contain the retrieval candidates but will not contain the gt_index
field that gives the index of the ground truth response.
With respect to the files, I just realized that the suffixes public
and private
got switched for API calls. While I will fix these later, please take a look at the wrongly named furniture_devtest_dials_api_calls_teststd_format_private.json
(should be public) and corresponding file for fashion
.
Hope this helps!
Hi, I have 2 questions raised from the test-std
dataset release regarding the response_generation
task.
Since the action
and attributes
annotations are not available for the current turn (differently from what was defined in TASK_INPUTS.md), are we able to slightly modify the code (not the model) to avoid using this information?
Whenever we encounter a potential SearchMemory
or SearchDatabase
in the k-th
turn (the one on which the generation and the action prediction are evaluated) we do not have the annotation about the new focus
item (during the first phase of the challenge it was included in the dials_api
JSON file). Since the response of the wizard is conditioned on the item she/he is looking at, how can we generate such a response if we do not have information about the item?
An example here below (dialogue 1902
):
{
"domain": "fashion",
"visual_objects": {},
"system_transcript": "",
"turn_idx": 2,
"belief_state": {},
"transcript": "Show me another coat, but one that 212 Localts more."
}
Hello @seo-95 ,
Thanks for raising these two important concerns. We've updated the API calls file to include these two images for the last turn on which evaluation is performed. Of course, you're not allowed to use the ground truth API calls for subtask-1 but can use these for subtask-2 (as per the table).
Apologies for the confusion earlier, hope this addresses your concerns.
Hi, I am curious about the test-std files will be released on Sept 28. Will you provide retrieval-candidate files and api call files? Or just only provide one JSON file. ( _{domain}_devtest_dials_teststd_formatpublic.json)
Since we want to use the action information which is allowed at inference time mentioned in TASKINPUTS.md. However, there is no annotated information in {furniture/fashion}_devtest_dials_teststd_formatpublic.json. How can we get the ground truth action?
We are using the preprocessing script in mm_actionpredicition, but it is not able to process {furniture/fashion}_devtest_dials_teststd_formatpublic.json. Can you provide full test files before Sept. 28 that we can directly run on the baseline model?