ebu / benchmarkstt

Open Source AI Benchmarking toolkit for benchmarking speech to text services
MIT License
54 stars 8 forks source link

Add WER based on minimal edit distance (Levenshtein distance)? #163

Closed johann-petrak closed 6 months ago

johann-petrak commented 2 years ago

It would be very useful to allow the use of Levenshtein edits instead of the difflib edits.

This would allow to calculate "proper" wer and also other metrics like match error rate or word information lost correctly, and still use them to also show the differences etc.

aro-max commented 2 years ago

You can use the Levenstein distance instead of difflib edits which is the default parameter. Please see the explanation in the doc : https://benchmarkstt.readthedocs.io/en/latest/tutorial.html#word-error-rate-variants

aro-max commented 6 months ago

The doc explains how to use the Levenshtein distance.