Closed mazatov closed 2 years ago
Hi Mazatov, thanks for your interest and your question. I uploaded a new version of the README with further information about the dataset structure. Hopefully, you'll find an answer to your question there. As you mentioned, player identities are only valid within an action, and a player might be assigned multiple identities if he has been spotted in multiple actions. This is not an issue during testing because query to gallery matching is only performed for samples from the same action. However, this might be an issue at training depending on your training procedure and losses, but it's up to the participant to find solution for that. In the original Torchreid code, they use a 'camid' field at testing stage for similar purpose, i.e., filter out gallery samples w.r.t. the camid field of the corresponding query sample. For the SoccerNet challenge, we use the 'camid' field to carry the 'action_idx' information, and use that information in the 'rank.py -> eval_soccernetv3()' function to filter out gallery samples with different 'action_idx' than the current query. As you mentioned, all query to gallery distances are computed in a big distance matrix 'distmat' in 'engine.py'. However, a lot of these distances are not useful because not taken into account for the final performance evaluation. This part should be optimised further for futur versions, for now we just kept the original Torchreid code for computing this big distance matrix.
Hello, I have a related question about the actions.
Thanks
Do we have a guarantee that the player only exists in one
action_idx
? Going through the datasets, I see that there are lots of actions per game, and so my instinct is that there'll be lots of repeats between actions in one game as there are moreperson_uid
per game than players on the pitch. Are we supposed to ignore the fact that we might have repeats between actions?For example in the training dataset, the first game is
2015-02-21 - 18-00 Chelsea 1 - 1 Burnley
. Looking through all the 23 actions of that game, we have 808 images for 429 players! Given that we have a maximum of 26 players per game, a lot of those 808 images correspond to the same player but have different player_id. That seems like a really bad assumption to make.I also see evidence of that during the evaluation of results as well.
At the same time during the testing phase , we seem to calculate features across the entire dataset and match persons between entire datasets of query and gallery, without taking into account
action_idx
. At least it looks that way in the code. Could you clarify howaction_idx
is defined and how it's taken into account in all parts of the pipeline( train, valid, test, challenge)?