Striveworks / valor

Valor is a centralized evaluation store which makes it easy to measure, explore, and rank model performance.
https://striveworks.github.io/valor/
Other
38 stars 4 forks source link

BUG: `lite` Counts are incorrect for a core test case #757

Closed ntlind closed 2 months ago

ntlind commented 2 months ago

valor version checks

Reproducible Example

test was taken from test_evaluate_detection_functional_test_with_rasters (same as #754) and converted to use bounding boxes instead of rasters. this test fails on the first assertion.


def test_counts_ranked_pair_ordering(
    detection_ranked_pair_ordering: Detection,
):

    loader = DataLoader()
    loader.add_data(detections=[detection_ranked_pair_ordering])
    evaluator = loader.finalize()

    metrics = evaluator.evaluate(iou_thresholds=[0.5, 0.75])

    actual_metrics = [m.to_dict() for m in metrics[MetricType.Counts]]
    expected_metrics = [
        {
            "type": "Counts",
            "value": {"tp": 1, "fp": 0, "fn": 0},
            "parameters": {
                "iou_threshold": 0.5,
                "score_threshold": 0.5,
                "label": {"key": "class", "value": "label1"},
            },
        },
        {
            "type": "Counts",
            "value": {"tp": 1, "fp": 0, "fn": 0},
            "parameters": {
                "iou_threshold": 0.75,
                "score_threshold": 0.5,
                "label": {"key": "class", "value": "label1"},
            },
        },
        {
            "type": "Counts",
            "value": {"tp": 1, "fp": 0, "fn": 0},
            "parameters": {
                "iou_threshold": 0.5,
                "score_threshold": 0.5,
                "label": {"key": "class", "value": "label2"},
            },
        },
        {
            "type": "Counts",
            "value": {"tp": 1, "fp": 0, "fn": 0},
            "parameters": {
                "iou_threshold": 0.75,
                "score_threshold": 0.5,
                "label": {"key": "class", "value": "label2"},
            },
        },
        {
            "type": "Counts",
            "value": {"tp": 0, "fp": 0, "fn": 1},
            "parameters": {
                "iou_threshold": 0.5,
                "score_threshold": 0.5,
                "label": {"key": "class", "value": "label3"},
            },
        },
        {
            "type": "Counts",
            "value": {"tp": 0, "fp": 0, "fn": 1},
            "parameters": {
                "iou_threshold": 0.75,
                "score_threshold": 0.5,
                "label": {"key": "class", "value": "label3"},
            },
        },
        {
            "type": "Counts",
            "value": {"tp": 0, "fp": 1, "fn": 0},
            "parameters": {
                "iou_threshold": 0.5,
                "score_threshold": 0.5,
                "label": {"key": "class", "value": "label4"},
            },
        },
        {
            "type": "Counts",
            "value": {"tp": 0, "fp": 1, "fn": 0},
            "parameters": {
                "iou_threshold": 0.75,
                "score_threshold": 0.5,
                "label": {"key": "class", "value": "label4"},
            },
        },
    ]
    for m in actual_metrics:
        assert m in expected_metrics
    for m in expected_metrics:
        assert m in actual_metrics

Issue Description

here is lite's output:

[{'type': 'Counts', 'value': {'tp': 0, 'fp': 0, 'fn': 1}, 'parameters': {'iou_threshold': 0.5, 'score_threshold': 0.5, 'label': {'key': 'class', 'value': 'label1'}}}, {'type': 'Counts', 'value': {'tp': 0, 'fp': 0, 'fn': 1}, 'parameters': {'iou_threshold': 0.75, 'score_threshold': 0.5, 'label': {'key': 'class', 'value': 'label1'}}}, {'type': 'Counts', 'value': {'tp': 1, 'fp': 0, 'fn': 0}, 'parameters': {'iou_threshold': 0.5, 'score_threshold': 0.5, 'label': {'key': 'class', 'value': 'label2'}}}, {'type': 'Counts', 'value': {'tp': 1, 'fp': 0, 'fn': 0}, 'parameters': {'iou_threshold': 0.75, 'score_threshold': 0.5, 'label': {'key': 'class', 'value': 'label2'}}}, {'type': 'Counts', 'value': {'tp': 0, 'fp': 1, 'fn': 1}, 'parameters': {'iou_threshold': 0.5, 'score_threshold': 0.5, 'label': {'key': 'class', 'value': 'label3'}}}, {'type': 'Counts', 'value': {'tp': 0, 'fp': 1, 'fn': 1}, 'parameters': {'iou_threshold': 0.75, 'score_threshold': 0.5, 'label': {'key': 'class', 'value': 'label3'}}}, {'type': 'Counts', 'value': {'tp': 0, 'fp': 0, 'fn': 0}, 'parameters': {'iou_threshold': 0.5, 'score_threshold': 0.5, 'label': {'key': 'class', 'value': 'label4'}}}, {'type': 'Counts', 'value': {'tp': 0, 'fp': 0, 'fn': 0}, 'parameters': {'iou_threshold': 0.75, 'score_threshold': 0.5, 'label': {'key': 'class', 'value': 'label4'}}}]

three issues here (all of which may be directly connected to #756):

Expected Behavior

pass this test to ensure that lite behavior matches the API

czaloom commented 2 months ago

The output is correct for score_threshold = 0.5