The BPE splitted data contains unk for words ("operating"), but also subwords. Moreover, dicts for 4614, 30387 in model_input are not the correct ones. Worked better for known and unknown words
TODO:
[x] Check the dict copying and generation process
[x] Fix that subwords does not result in unk
[x] Maybe move splitting to extra process instead done by fairseq (apply_bpe.py with vocabulary option)
The BPE splitted data contains unk for words ("operating"), but also subwords. Moreover, dicts for 4614, 30387 in model_input are not the correct ones. Worked better for known and unknown words
TODO: