Closed diamondgloves closed 2 years ago
Thanks for flagging. You're right, the truncation logic is incorrect. As for a fix - inserting bos and eos at first and last position will also be problematic if there are shorter seqs who may end up looking like bos [seq] <eos> pad pad pad eos
. In fact the best fix would be to add a max_seq_len
argument to BatchConverter initialization and truncate in the __call__
Do you still need help on this issue?
Yes sure we'd welcome an improvement to this script.
@tomsercu can i do it as my first contribution? also can you explain please how to add a max_seq_len argument to BatchConverter initialization and truncate in the call
Discussed in https://github.com/facebookresearch/esm/discussions/156