daviddiazvico / scikit-datasets

Scikit-learn-compatible datasets
MIT License
16 stars 3 forks source link

Score table style #70

Closed vnmabus closed 2 years ago

vnmabus commented 2 years ago

Improve default score table style.

Add functionality to:

Example

We will use the following data (note that the std is 0, so all results will be significant):

import numpy as np
means = np.zeros((5, 3))
means[0, 0] = 1
means[1, 1] = 1
means[2, 1] = 1
means[2, 2] = 1
means[3, 1] = 1
means[3, 2] = 2
stds = np.zeros((5, 3))

We create the table of accuracies as:

from skdatasets.utils.scores import scores_table, average_rank, average_mean_score

table = scores_table(
    means,
    stds,
    datasets=range(5),
    estimators=["a","b","c"],
    nobs=1000,
    significancy_level=0.05,
    summary_rows=[
        ("Average rank", average_rank),
        ("Average score", average_mean_score),
    ],
    default_style="html",
)

Here is the HTML result (calling to_html):

imagen

Black is used for the first position, underline for the second, and an asterisk for significant results (compared with the next in the ranking). Additional columns are in italics.

Using instead

from skdatasets.utils.scores import scores_table, average_rank, average_mean_score

table = scores_table(
    means,
    stds,
    datasets=range(5),
    estimators=["a","b","c"],
    nobs=1000,
    significancy_level=0.05,
    summary_rows=[
        ("Average rank", average_rank),
        ("Average score", average_mean_score),
    ],
    default_style="latex",
)

we have Latex styles, available when calling to_latex:

imagen

The styles are further customizable using the Styler methods. Moreover, the HTML output includes CSS classes for rankings and significant results, and the Latex output includes custom commands that can be redefined on the final document.