Open masashi-hatano opened 2 years ago
hi @masashi-hatano happy to have you as our participant! Things run as expected on my side, for your information our baseline also gives 1*20 non-zero prediction results for sample "21523837", which means it should be fine to have non-zero predictions on frames without hands. As we mentioned on the challenge page, **"Our evaluation script won't penalize your algorithm if it gives predictions on frames without hands."_ I suggest you revisit our sample evaluation code to understand how does our metrics work. Specifically, you'll see how we filter out the out-of-frame hand situation in L80 to make sure it won't influence the submission. Please also make sure to use our script generate_submission.py to generate the submission file, and don't forget to take care of the _numclips=30** argument. Feel free to ask if you still got blocked!! Happy to help :)
@VJWQ Thanks for your reply! I solved this problem by using num_clips=30, and evaluation was done correctly.
But, I don't really understand why num_clips is needed. According to the sample evaluation code, num_clips is used just for dividing the predicted values. I would appreciate if you could give me some explanation about it.
@VJWQ Thanks for your reply! I solved this problem by using num_clips=30, and evaluation was done correctly.
But, I don't really understand why num_clips is needed. According to the sample evaluation code, num_clips is used just for dividing the predicted values. I would appreciate if you could give me some explanation about it.
Sure, please refer to the explanation. the number 30 is obtained from the line cfg.TEST.NUM_ENSEMBLE_VIEWS * cfg.TEST.NUM_SPATIAL_CROPS
, which is an operation for better testing the robustness of the model. In short, we need to /30
when generating the submission file to obtain the average performance on each test clip.
@VJWQ
In generate_submission.py.
num_clips
seems not to be used.
@masashi-hatano @VJWQ Hello, I also have some questions. I want to know what is the loss for validation when you training the baseline code. I append the code
for key in pred_dict:
pred_dict[key] = pred_dict[key] / num_clips
after the multi-view accumulation. But I also get the eval results similar to the results in your first comment. I doubt if there is a problem with my data sampling.
@takfate
Hello, It probably helps you.
If you evaluate your model by using generate_submission.py and eval.py, the predicted value will be divided by num_clips
twice, so it may solve your problem if you remove either of them.
@masashi-hatano I use generate_submission.py to generate a submission file for the test set and submit it to EvalAI evaluation system. Will the EvalAI evaluation system do another division by 30?
@masashi-hatano I use generate_submission.py to generate a submission file for the test set and submit it to EvalAI evaluation system. Will the EvalAI evaluation system do another division by 30?
hi @takfate, your results will not be divided twice. In generate_submission.py
, num_clips
is just a placeholder and does not really do division on your results. This file serves to sum all 30 prediction results for one clip, and the actual division happens in our evaluation script after you submit your results json file in which /30
helps to obtain the average results for each clip. However, you still need to run python tools/generate_submission.py /path/to/output.pkl 30
to generate the submission file correctly.
@masashi-hatano @takfate
Do you mind posting the commands you use to generate the submission file? I can have a look at them to see why you are receiving similar results and adjust our guidance accordingly.
If so, it's fine for me, thanks though.
@VJWQ @masashi-hatano Our eval results are already normal. Thank you for your help.
I tried submitting a json file, which follows the specified format, and I obtained the quantitative result as follows.
However, even though the results that we tested in the validation dataset were better than the baseline, the results obtained from the actual submissions have a huge amount of errors. This is probably because the mask is not multiplied by the prediction we submit. The mask is used so that the error is zero on frames in which hand is not visible.
To demonstrate that the quantitative results presented above are anomalous, here are a prediction list, which is a part of my submission.json file, and its visualization result.
As you can see these figures, the quantitative results obtained from the actual submission seem to be incorrect, and the reason for this is thought to be that the loss is calculated without multiplying the predictions by the masks.
@VJWQ Could you please confirm that the loss calculation is done correctly? In particular, I would appreciate it if you could check if the process is done to set the error to zero if the hands are not in frames.
Figure1 pre_45 frame Figure2 pre_30 frame Figure3 pre_15 frame Figure4 pre_frame Figure5 contact_frame