To better compute ROUGE scores in German, it might be necessary to split compound words, and improve lemmatization/stemming.
For that purpose, there is the GermanNet list of split compounds, which has over 100,000 samples available.
These are available for academic research only, which means that it might make sense to look for (potentially also commercially viable) alternatives elsewhere first. In particular, this probably also prevents us from licensing this under MIT or Apache...
An alternative approach could be this library: https://github.com/dtuggener/CharSplit
It is also licensed under MIT, which is better for us, but would still have to check how good it works.
To better compute ROUGE scores in German, it might be necessary to split compound words, and improve lemmatization/stemming. For that purpose, there is the GermanNet list of split compounds, which has over 100,000 samples available.
These are available for academic research only, which means that it might make sense to look for (potentially also commercially viable) alternatives elsewhere first. In particular, this probably also prevents us from licensing this under MIT or Apache...