Closed jqu-striveworks closed 1 month ago
Hey Justin - thanks for posting this issue!
If I have one labeled datum, and one unlabeled datum, Valor does not assume that the unlabeled datum is missing a label key and does not count it as true negative.
In this first example, the issue lies with the idea of an "unlabeled datum": since the second GroundTruth
doesn't have any labels, it's actually not considered a GroundTruth
at all and never makes it into groundtruth_df
(this also means that uid1
is never considered to be a separate datum in this evaluation). This is expected behavior at the moment, but it might be wise for us to throw an error if the user tries to pass a GroundTruth
without any labels. I'll leave this issue open to discuss this change with the rest of the team in the future.
If it really was a true negative then it should be provided as an example (but its not). I asked for 1 example in my detailed PR curve and it did not return one despite counting a true negative.
Good call-out. This was a bug where we didn't include true negative examples of this kind in the DetailedPRCurve output. This will be fixed in #744.
closing this out as we've decided to remove label keys, which should fix the confusion surrounding this issue.
valor version checks
Reproducible Example
Issue Description
I have two datasets, each with one image. The two datasets have different keys and different classes. If run separately, they produce some metrics. These metrics should be identical to if I run evaluations with both of these datasets/prediction together since they are completely disjoint.
This behavior is wildly inconsistent. If I have one labeled datum, and one unlabeled datum, Valor does not assume that the unlabeled datum is missing a label key and does not count it as true negative.
If I have one labeled datum and one unlabeled datum and a second prediction for a second label key. It does not assume that either datum is missing the second label key and produces no metrics for the second label key.
Yet once I label the unlabeled data with the second label key. Now valor assumes datum 0 has a
dataset2
label key and datum 1 has adataset1
label key.If it really was a true negative then it should be provided as an example (but its not). I asked for 1 example in my detailed PR curve and it did not return one despite counting a true negative.
Expected Behavior
Be Consistent. Be Correct.
The
TP/FP/TN/FN
count for evaluation on label keydataset 1
should be the same regardless of whetherdataset 2
is present or not.