Open ng-4r opened 1 year ago
Hi there,
Sorry for the delay.
Kenlm is only used to clean the common crawl data if I remember correctly. You can probably find other ways to clean the data using other heuristics, or not clean it at all (but get potentially worse performance).
Another solution is also to use the ChatGPT API which is very good at text simplification in multiple languages.
Hi!
thank you very much for your reply. So I can replace that part with other methods.
I know GPT capabilities, but I'm studying this topic and I want to make a comparison of different models, including GPT with zero-/few-shot learning
Hi!
I see that it is possible to use MUSS with other languages:
But what if the target language is not listed in the kenlm repository? I would like to try this system on Italian