facebookresearch / cc_net

Tools to download and cleanup Common Crawl data
MIT License
964 stars 139 forks source link

Model finding #21

Open sashavor opened 3 years ago

sashavor commented 3 years ago

When specifying a language model in the config (usinglm_languages:=en), the process throws an error: OSError: Not found: "data/lm_sp/e.sp.model": No such file or directory Error #2

The code works fine when no lm_languages are specified.

I think the issue is the following line, since it only considers a single character for the model name: https://github.com/facebookresearch/cc_net/blob/bda555bd1cf1ee2e0b925363e62a61cd46c8b60d/cc_net/mine.py#L393