UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT
https://www.SBERT.net
Apache License 2.0
14.76k stars 2.43k forks source link

Error when autologging InformationRetrievalEvaluator results to mlflow #2725

Open kvarekamp opened 2 months ago

kvarekamp commented 2 months ago

The error is pretty self-explanatory:

ERROR mlflow.utils.async_logging.async_logging_queue: Run Id abec744c4f86451c91984386691ad733: Failed to log run data: Exception: Invalid metric name: 'eval_cosineaccuracy@1'. Names may only contain alphanumerics, underscores (), dashes (-), periods (.), spaces ( )

Replacing the @s with _at_ or similar would likely solve the problem

tomaarsen commented 2 months ago

Ideally, MLflow would be adapted to allow these logs, but it seems that they're not interested in that:

Additionally, I'd rather not update all of these metrics for everyone, as some people might rely on the current names (e.g. via best_metric or hyperparameter optimization relying on these names).

Perhaps the best solution is to override the MLflowCallback from transformers with a variant that first replaces "@" with "at" and then proceeds with the normal on_log behaviour. Kind of like a man-in-the-middle. I'm a bit wary that it'd get somewhat messy, though.

scriptator commented 1 month ago

I followed @tomaarsen's suggestion and implemented this:

from transformers.integrations import MLflowCallback

def make_metric_name_mlflow_compatible(metric_name: str) -> str:
    metric_name = metric_name.replace("@", "_at_")
    return metric_name

class MetricRenamingMlFlowCallback(MLflowCallback):
    """
    A variant of the standard MLflowCallback that replaces special characters in metric names so that MLFlow can
    handle it:
        '@' --> '_at_
    """
    def on_log(self, args, state, control, logs, model=None, **kwargs):
        logs = {make_metric_name_mlflow_compatible(k): v for k, v in logs.items()}
        super().on_log(args, state, control, logs, model, **kwargs)

To get it working, set report_to="none" in SentenceTransformerTrainingArguments and callbacks=[MetricRenamingMlFlowCallback()] when instantiating SentenceTransformerTrainer. This will deactivate all existing callbacks and use only the custom one.