UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.44k stars 2.5k forks source link

BinaryClassificationEvaluator returns np.float32() and np.float64() #3075

Closed chr-werner closed 3 days ago

chr-werner commented 1 week ago

I came across the problem where

output_scores[similarity_fn_name] = {
  "accuracy": acc,
  "accuracy_threshold": acc_threshold,
  "f1": f1,
  "f1_threshold": f1_threshold,
  "precision": precision,
  "recall": recall,
  "ap": ap,
}

is returning outputs in np.float32() and np.float64() which cause problems during model saving, as the json/encoder.py sees those types as not JSON serializable.

By changing the code snippet to

output_scores[similarity_fn_name] = {
  "accuracy": acc.item(),
  "accuracy_threshold": acc_threshold.item(),
  "f1": f1.item(),
  "f1_threshold": f1_threshold.item(),
  "precision": precision,
  "recall": recall.item(),
  "ap": ap.item(),
}

the elements get copied to a standard Python scalar each (note: precision already is) and then behave like expected during saving.

tomaarsen commented 1 week ago

Hello!

Well spotted, this is suboptimal. I'd rather have all of these converted to floats as expected. I'm tackling some other PRs right now, but I'll pick this up in a few days if someone else hasn't beat me to it by then. Thanks for reporting!

JINO-ROHIT commented 1 week ago

ill be happy to work on this :)