bheinzerling / bpemb

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
https://nlp.h-its.org/bpemb
MIT License
1.18k stars 101 forks source link

Number isseue in bpemb #47

Closed aimanmutasem closed 4 years ago

aimanmutasem commented 4 years ago

Dear @bheinzerling

I faced a new challenge with Bpemb, it converts any number to zeros like (1234) to (0000), this issue affects negatively in the final BLUE score.

How I can overcome this challenge?

Regards,

bheinzerling commented 4 years ago

This is intended behaviour and explained here:

https://github.com/bheinzerling/bpemb/issues/20