GigaVision / PANDA-Toolkit

PANDA dataset tool kit for data visualization, splitting, merging, and result evaluation.
MIT License
93 stars 24 forks source link

Did you use this toolkit for the Panda Paper? #3

Closed MickaelCormier closed 3 years ago

MickaelCormier commented 3 years ago

Hi there, Thanks for publishing the Panda Dataset and this toolkit! It's very much apreciated!

I have a few question regarding this Toolkit and your paper.

  1. Did vou use the toolkit to calculate the results in the paper? More precisely regarding Table 3, for the results on Panda-Image. Do you use for AP an IoU of 0.5 or [0.5:0.95]? I ask because in the Paper you write IoU = 0.5 however the eval script doesn't have it as default.

  2. In the paper you write:

    "Sub means subset of different target sizes, whereSmall, Middle, and Large indicate object size being<32×32,32×32−96×96, and>96×96."

    Which are the original Coco range. However in the DetEval Script your defaults are: parser.add_argument('--areaRng', type=list, help='[...] A = 4 object area ranges for evaluation', default=[0, 200, 400, 1e5])

  3. Did you use NMS for merging the results?

    • In the merge script, it is activated.
    • In the demo script, it isn't.
    • There is no mention of it in the paper.
DarkstartsUp commented 3 years ago

Thank you for your interest in our dataset!

  1. In paper, we used AP with an IoU of 0.5 in Table 3.

  2. About area ranges, we are sorry that we didn't make it clear in the paper. For our results in Table 3, for all scenes, firstly we reshape all images to 8000x4800, then we run detectors on frames and calculate COCO-like metric using area thresholds of 0, 32 2, 96 2, 1e5 ** 2. (The result of Table 3 in paper has nothing to do with the threshold in PANDA toolkit)

  3. Yes, for our results in Table 3, we used NMS for merging the results.

MickaelCormier commented 3 years ago

Hi, thanks for your answers! By "reshape to 8000x4800" do you mean crop or resize?

DarkstartsUp commented 3 years ago

Resize. Thanks!

MickaelCormier commented 3 years ago

I'm sorry I'm not sure I get it and would like to be sure, so I can compare our results correctly. I feel I'm missing something.

In your paper Section 4.1 you write

Similarly, for evaluation, we resize the original image into multiple scales and use the sliding window approach to generate proper size blocks for the detector

So you resize to 8000x4800 and then extract crops/blocks. Which size are those blocks for the detectors input? Or do you do the inference on 8000x4800? If so I'm confused. Which GPU Type do yo use for infer on such a size? And from which resolution would you extract these large blocks?

Also, do you plan to publish the annotations for the Image Test-Set or to offer an evaluation server? Otherwise it will be difficult to benchmark approaches on your dataset.

DarkstartsUp commented 3 years ago

@MickaelCormier Hello, here are some details of the detection baseline experiments:

  1. For all video scenes, firstly we resized the source frame to (8000, 4800).
  2. In order to solve the scale difference between the front and back scenes, we zoomed the bottom half of the image again to (5332, 1600). (Multiple scales processing we used).
  3. Then we used a sliding window with the size of (1333, 800) with overlap (10%) as detectors input, after that NMS is used to merge results.
  4. Finally, we projected results on the bottom half of the image return to (8000, 4800), and calculated COCO-like metric using area thresholds of 0, 32 2, 96 2, 1e5 ** 2.