Closed Jay-IPL closed 1 year ago
As specified in our paper, these dense interpolations are not used in training or evaluating the VOS model. They are filtered results of the VOS model, which weights we also release
For the VOS task, we only use the manual sparse labels for training - note that our code is already public to train the model to replicate results for that baseline which you can check at: https://github.com/epic-kitchens/VISOR-VOS You can check that repo for all details instead of this repo which focuses on frame extraction rather than the VOS benchmark.
Obviously, we only use manually labelled data to evaluate. Note that in the unreleased test set we have dense manual masks which are not released but used in evaluating the model for the test set.
thanks for the clarification!
I went through that repo. It seems the model is evaluated only on VISOR sparse annotated val data instead of test data right? what do you mean by 'Note that in the unreleased test set we have dense manual masks which are not released but used in evaluating the model for the test set.'?
We provide the code to train on "train" and evaluate on "val". This allows you to replicate our val results.
The same code can be used to train on "train+val" and evaluate on test, but as test is not released (i.e. a leaderboard will be opened but it has not yet). But the same code is used in either.
Once again, please raise your questions in the right repo so we can answer you correctly. These Qs are not related to this repo.
Also please read our paper more carefully. The answers to your Qs are available in supplemental H.3.
hi I visualized the interpolated masks' quality. There are many missing/inaccurate masks (about 50% of interpolated masks).
Question:
thanks!