apptek / SubER

SubER - Subtitle Edit Rate
Apache License 2.0
21 stars 3 forks source link

Verbatim SubER #3

Closed sarapapi closed 1 year ago

sarapapi commented 2 years ago

Hi again, This is not a real issue but an "enhancement" request. I am using SubER for a paper and asking if there is a way to obtain more information about the results obtained, i.e. since the metric is Levenshtein-based, can we have information about deletion, insertion, etc.? It would be useful to perform analyses and have some suggestions about system behavior. Thank you

patrick-wilken commented 2 years ago

Yes, I also already thought about this, see last line in the README :) I actually had a somewhat hacky way of getting the alignment out of the TER algorithm during initial development. I would need to do a clean version of that. A first step would be to just get the number of deletions, insertions and substitutions. (Probably separately for word and break tokens). The other option would be to really give detailed Levenshtein alignment information per sentence. The original TER tool has that. What did you have in mind? Both, I guess? :) The "problem" is that I use the sacrebleu implementation of TER. It does not provide this information and I want to avoid altering it too much because I treat it as a reference implementation. But I can try to come up with a compromise. 😉

(Sorry, late reply due to vacation.)