In the paper, Tables 1, 2, and 6 present the model’s test results on the VisDrone dataset. Regarding these results, I have two questions:
The paper mentions, “Following the existing works, we used the validation set for evaluating the performance.” Does this imply that the results listed above are all based on the validation set, which consists of 548 pictures?
It is noted in the paper that a resolution of (1.5K pixels) was adopted. Could you please clarify what specific resolution this refers to? Additionally, could you specify the input resolutions used for the results in Tables 1, 2, and 6?
I look forward to your response and appreciate your assistance. Best regards!
Hi, Thanks for your interest in our work! To answer your questions
Yes, the results are reported on the validation set consisting of 548 images.
1.5K pixels is the average width of images from the Visdrone Dataset. The config file for each setting has the info of the input resolution used for training and inference. See for example the Base Faster RCNN at inference. The images are basically resized respecting the given MIN_SIZE and MAX_SIZE constraints. Please check Detectron2 code for additional details.
Dear Author,
In the paper, Tables 1, 2, and 6 present the model’s test results on the VisDrone dataset. Regarding these results, I have two questions:
The paper mentions, “Following the existing works, we used the validation set for evaluating the performance.” Does this imply that the results listed above are all based on the validation set, which consists of 548 pictures? It is noted in the paper that a resolution of (1.5K pixels) was adopted. Could you please clarify what specific resolution this refers to? Additionally, could you specify the input resolutions used for the results in Tables 1, 2, and 6? I look forward to your response and appreciate your assistance. Best regards!