ELITR / SLTev

SLTev is a tool for comprehensive evaluation of (simultaneous) spoken language translation.
8 stars 3 forks source link

Support custom tokenization #7

Open obo opened 3 years ago

obo commented 3 years ago

Anyone who would like SLTev to support custom tokenizers (e.g. via --tokenizer=...), please discuss here. Let's add only features people need. Pull requests are also welcome.

mohammad2928 commented 3 years ago

It is a good idea. There are two approaches for dealing with different tokenizer idea. First approach: we can make various tokenizers in the SLTev and identified them with numbers or names. Second approach: we can allow users to use a file that contains a function for tokenization.