athaddius / STIRMetrics

Metric Evaluation for Models on STIR
MIT License
4 stars 1 forks source link

Questions about the competition metrics #5

Closed SzuPc closed 2 months ago

SzuPc commented 2 months ago

When I used the TAP-Vid method proposed by the competition, I found a problem. When STIRMetric performs epe calculation, the KDTree algorithm is used to pair points. Because when STIRLoad is used to obtain data labels, the number of initial points and the number of final points usually do not correspond. This is because there are some small pixel noise points around many points, and no label processing is performed on each point, which makes it impossible to calculate the distance when the TAP-Vid code calculates the distance. We found that when the time is less than 10s, the number of initial and final points of about half of the 400 image pairs do not correspond. I want to know how the official competition handles this situation? When will the calculation method of TAP-Vid in this competition be given? What is the final deadline for the competition?

athaddius commented 2 months ago

Correct, the pixel noise on the segmentation thresholding can cause some noise around labels. This causes what should be a nearly normally-distributed noise for each label. A single central label point in a starting image can, due to noise, become multiple label points within the boundary of the true center point in the ending image.

Validation set: For the validation set, on some start and final image pairs (~400 according to your checking) there is not the exact same amount of points due to sensor noise, specularity, blur, etc. Because of this, the threshold metric (TAP) will then provide a lower bound on error (the error with noisy points will always be lower than the true error).

Test set: For the test set, we have gone through the labels by hand and have filtered sequences to only those with an evenly corresponding amount of points.

The deadline should be September 9th, but we will start welcoming submissions before then, particularly to ensure code can run on our verification and evaluation pipeline. Please reach out again if any of this is unclear :)