Closed keighrim closed 9 months ago
Given the new output format that's being discussed in https://github.com/clamsproject/app-swt-detection/issues/41, the evaluation plan is as follow
TimePoint
s and TimeFrame
s. targets
list and compare the frameType
value of the TimeFrame
and label
value of the target TimePoint
, collect pairs that are differenttimePoint
value of the TimePoint
annotations in the collected "disagreeing" pairs, look for the gold label and judge which one is correct, count scores (correct for 1)Done pretty much as described above with one difference. Trying to mimic the sample rate was impossible since the app at the moment only accept milliseconds and the rate used for the annotation was using some number of frames I think. So I just used a frame for the annotation that was within at most 32ms.
Because
We want to see if the stitcher/smoothing added via https://github.com/clamsproject/app-swt-detection/issues/33 is doing well, independent from the accuracy of image-level classification model.
Done when
Controlled evaluation is done to measure the effectiveness of the stitcher. The evaluation at high-level, should be measuring the performance difference between raw image classification results and re-constructed image classification results from
TimeFrame
annotations.Additional context
Original idea of having this evaluated was proposed by @owencking in his email on 12/15/2023. Here's an excerpt from it.