facebookresearch / muss

Code and models used in "MUSS Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".
Other
97 stars 39 forks source link

Train/adapt to other languages #43

Open ng-4r opened 1 year ago

ng-4r commented 1 year ago

Hi!

I see that it is possible to use MUSS with other languages:

If you are going to add a new language to this project, in folder resources/models/language_models/wikipedia donwload the files of the target language from https://huggingface.co/edugp/kenlm/tree/main/wikipedia. These language models are used to filter high quality sentences in the paraphrase mining phase.

But what if the target language is not listed in the kenlm repository? I would like to try this system on Italian

louismartin commented 1 year ago

Hi there,

Sorry for the delay.

Kenlm is only used to clean the common crawl data if I remember correctly. You can probably find other ways to clean the data using other heuristics, or not clean it at all (but get potentially worse performance).

Another solution is also to use the ChatGPT API which is very good at text simplification in multiple languages.

ng-4r commented 1 year ago

Hi!

thank you very much for your reply. So I can replace that part with other methods.

I know GPT capabilities, but I'm studying this topic and I want to make a comparison of different models, including GPT with zero-/few-shot learning