AmenRa / ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
https://amenra.github.io/ranx
MIT License
438 stars 24 forks source link

feature request: save Report.comparisons as JSON #4

Closed PaulLerner closed 2 years ago

PaulLerner commented 2 years ago

Hi,

It’d be nice to be able to save a Report comparisons as a JSON file. However, since it uses frozenset as keys, it is not JSON serializable.

Maybe you could add a method in https://github.com/AmenRa/ranx/blob/master/ranx/frozenset_dict.py to convert the _map to a JSON serializable dict, i.e. with str keys? The str keys could be converted from the frozenset like: ', '.join(frozenset({'foo', 'bar'}))

AmenRa commented 2 years ago

Hi, an export option for the Report class is already on my to-do list! :)

I will come back with a proposal so that we can discuss it before I implement the functionality.

AmenRa commented 2 years ago

Hey, sorry for the delay.

This is my proposal for the Report.to_dict function (I can add a Report.save_as_json function for convenience too):

{
    # metrics and model_names allows to read the report without
    # inspecting the json to discover the used metrics and
    # the compared models
    "metrics": ["metric_1", "metric_2", ...],
    "model_names": ["model_1", "model_2", ...],
    #
    "model_1": {
        "scores": {
            "metric_1": ...,
            "metric_2": ...,
            ...
        },
        "comparisons": {
            "model_2": {
                "metric_1": ...,  # p-value
                "metric_2": ...,  # p-value
                ...
            },
            ...
        },
        "win_tie_loss": {
            "model_2": {
                "W": ...,
                "T": ...,
                "L": ...,
            },
            ...
        },
    },
    ...
}

Let me know what you think. :)

PaulLerner commented 2 years ago

Looks great (and there was not so much delay :sweat_smile:)!

AmenRa commented 2 years ago

I added Report.to_dict and Report.save. I updated ranx on PyPi with these new features.

Closing.

PaulLerner commented 2 years ago

I’m getting a "TypeError: Object of type int64 is not JSON serializable" which is probably coming from numba or numpy

AmenRa commented 2 years ago

Yeah, I know about that issue. I will look into it soon.

As a workaround, you can call report.to_dict() and save the dictionary as a JSON by yourself with the exact same code I wrote for the report.save function.

That issue it's kinda weird.

PaulLerner commented 2 years ago

don’t you need to convert int64 to int in to_dict?

PaulLerner commented 2 years ago

for example in transformers they use:

def denumpify_detensorize(metrics):
    """
    Recursively calls `.item()` on the element of the dictionary passed
    """
    if isinstance(metrics, (list, tuple)):
        return type(metrics)(denumpify_detensorize(m) for m in metrics)
    elif isinstance(metrics, dict):
        return type(metrics)({k: denumpify_detensorize(v) for k, v in metrics.items()})
    elif isinstance(metrics, np.generic):
        return metrics.item()
    elif is_torch_available() and isinstance(metrics, torch.Tensor) and metrics.numel() == 1:
        return metrics.item()
    return metrics
PaulLerner commented 2 years ago

this fixes it but you probably want to deal with it some other way? If not I can open a PR https://github.com/PaulLerner/ranx/commit/7e2218dc9f8dcf330e8dc8127ef6bc505a658081

AmenRa commented 2 years ago

I will look into it soon.

AmenRa commented 2 years ago

Fixed in 0.1.10. Sorry for the inconvenience.

PaulLerner commented 2 years ago

I’m still getting TypeError: Object of type int64 is not JSON serializable

PaulLerner commented 2 years ago

oops, looks like I was on the wrong branch, sorry about that