bheinzerling / bpemb

Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
https://nlp.h-its.org/bpemb
MIT License
1.18k stars 101 forks source link

How to decode encoded byte-pair sentences? #57

Closed chayan-dhaddha closed 2 years ago

chayan-dhaddha commented 3 years ago

I am working on Hindi Language and have encoded successfully, but I cannot decode the sentence? After running the below the line, I am getting output same as one feed into it:

bpemb_hi.decode([" ▁होकर ▁अपने ▁मन ▁में ▁काम ▁किया ▁."]) output: ▁होकर ▁अपने ▁मन ▁में ▁काम ▁किया ▁.

Please guide me where am I doing wrong. Thanks in advance

bheinzerling commented 3 years ago

the decode method expects a list of byte-pair symbols, so this should work:

bpemb_hi.decode(["▁होकर", "▁अपने", "▁मन",  "▁में",  "▁काम", "▁किया, "▁."])