@nyadla-sys in #12 you pointed me to a multilang bin file, however, both it and the gen file have 50257 words. It actually looks like it supposed to have 51864 (en) or 51865) multilang tokens, so I'm clearly doing something wrong, but I'm not sure what it is. Can you elucidate the issue?
@nyadla-sys in #12 you pointed me to a multilang bin file, however, both it and the gen file have 50257 words. It actually looks like it supposed to have 51864 (en) or 51865) multilang tokens, so I'm clearly doing something wrong, but I'm not sure what it is. Can you elucidate the issue?