Support Running Metric Computation

eisen-ai / eisen-core

Core functionality of Eisen

MIT License

41 stars 10 forks source link

Support Running Metric Computation #16

Open dasturge opened 4 years ago

dasturge commented 4 years ago

The current behavior of eisen is to take the mean of metrics over batches regardless of if this makes sense or not. It also has the minor inaccuracy of adding additional weight to the final batch, e.g. if batch size is 128 and the last batch is only 10 examples.

I propose an Metric object modeled after pytorch ignite with three functions, naively:

class Metric:

    def reset():
        raise NotImplemented

    def update(*args, **kwargs):
        raise NotImplemented

    def compute():
        raise NotImplemented

class Accuracy(Metric):
    def __init__(self):
        self.num_correct = 0
        self.total = 0

    def reset():
        self.num_correct = 0
        self.total = 0

    def update(y_pred, y_true):
        if y_pred.ndim == y_true.ndim == 1:
            self.num_correct += np.equal(y_pred, y_true).astype(int).sum()
        elif y_pred.ndim == 2 and y_true.ndim == 1:
            y_pred = np.argmax(y_pred, dim=1)
            self.num_correct += np.equal(y_pred, y_true).astype(int).sum()
        else:
            ValueError("multilabel accuracy not supported, y_true must have ndim=1 to prevent ambiguity")
        self.total += y_true.shape[0]

    def compute():
        return self.num_correct / self.total

perhaps an EisenMetricWrapper could take metrics and run simple reductions on them along these lines to convert an arbitrary function into a Metric class which keeps track of a running average.

Critically, this allows for the ever-important metrics such as Precision or Recall, even PR_curve to be computed across epochs by eisen.

eisen-ai commented 4 years ago

for what concerns aggregation of results in order to produce logs and populate tensorboard (etc.), actually the output of metrics (and losses) of each element in each batch is saved in a vector. In the end we do have access to the values of each metric and each loss computed on each element of the dataset across one epoch.

This information is used by hooks. The logging hook, for example, represents this as mean over all elements and assigns one column to each metric in a table. For what concerns tensorboard the average, std, histogram and distribution of both losses and metrics is reported on tensorboard. Each loss and metric is again on its own.

For what concerns training/validation/testing, each metric and loss is computed on its own, and can be weighted with an absolute weight upon instantiation. Losses are used for backward step one after the other (retrain_graph=True) and metrics are just computed afterwards.

When saving models with a model save hook, only the mean of (either) all metrics or all losses is taken into account. the best model will be saved.

Does this issue refer to one of the three scenarios, or is there some other scenario you are referring to? Let's discuss more to see if there is something that has not been properly considered in the explanations above.

dasturge commented 4 years ago

I'm referring to the metrics passed into the Training and Testing Workflows, in which case the metric is calculated each batch and then the mean is taken afterwards by the LoggingHook. This only ends up supporting Accuracy or Losses with mean reduction, and it does not support reporting precision or recall or other non-mean metrics each epoch.

eisen-ai commented 4 years ago

it's more clear now.

I would like to retain the ability of passing metrics as lists of torch.nn.Modules to the workflow.

that said, I think it would be possible to also have a metric and loss aggregation strategy for summary, logging, early stopping etc purposes. I think it's a great idea.

by default could be mean aggregation, but other approaches should indeed be offered.

This should probably be done by EpochDataAggregator object, which is actually bound to be replaced with something better due to its numerous issues.

The class you proposed above seems good for the job indeed. Instances of this class could be part of EpochDataAggregator for example.

I am more than willing to discuss here a way to re-architect EpochDataAggregator. Eg. we could move the it into the base class for the workflow, and then allow additional arguments in the workflow to chose loss_reduction_module and loss_reduction_module.

These modules would probably simplify the internals of EpochDataAggregator. It would also be great to make this epoch data aggregation activity something a bit more lightweight and optimised in terms of memory etc.

Looking forward to hear your thoughts

dasturge commented 4 years ago

Well, metric could still inherit from nn.Module if it's really necessary for it to be a module. The trouble I see is that for things like precision, recall, and variance, you need to have some stateful values stored in order to do a running computation. For variance, it's sufficient to know the total number N examples seen thus far, but for precision/recall, the problem is that you need to keep track at least of the total number of true positives and false negatives, which in principle doesn't sum to N.

at the bare minimum you could have metrics look like this:

class Metric(nn.Module):

    def reset():
        raise NotImplemented

    def forward(*args, **kwargs):
        raise NotImplemented

class Precision(Metric):
    def __init__(self):
        self.value = 0
        self.true_positives = 0
        self.false_positives = 0

    def reset(self):
        self.value = 0
        self.true_positives = 0
        self.false_positives = 0

    def forward(y_pred, y_true):
        self.true_positives += np.logical_and((y_pred == 1), (y_true == 1)).astype(int).sum()
        self.false_positives += np.logical_and((y_pred == 1), (y_true == 0)).astype(int).sum()
        self.value = self.true_positives / (self.true_positives + self.false_positives)
        return self.value

So they can keep track of a state for running computation of the metric, while resetting at the end of each epoch.

You could also have a class passed in instead of utilizing a reset method and just initialize the class each epoch.

Another option perhaps is to just bake these things into the EpochAggregator, have it store counts of true positives, false negatives, etc. This isn't flexible but it certainly gives you easy access to running precision.