Currently, I want to use bpemb in my project. In order to process sentences with different lengths in a batch, I have to add padding token <pad> after some sentences. But I find it impossible to do that because bpemb will tear down the `
One way I can think of is to forcibly add <pad> through a complex process. But this process is a bit painful. So, are there other more flexible methods?
By the way, I found that the bpe token with id 0 seldom occurs in processed ids. Can I use it as the padding token?
Thanks for giving this easy-to-use tool.
Currently, I want to use
bpemb
in my project. In order to process sentences with different lengths in a batch, I have to add padding token<pad>
after some sentences. But I find it impossible to do that becausebpemb
will tear down the `One way I can think of is to forcibly add
<pad>
through a complex process. But this process is a bit painful. So, are there other more flexible methods?By the way, I found that the bpe token with id 0 seldom occurs in processed ids. Can I use it as the padding token?