Open sersh88 opened 4 days ago
what is the issue exactly ?
if tgt_prefix, for example, is empty, the transform anyways adds empty token to each tgt sequence, which is unknown token for the model. So, all dataset is trained with prepended unknown token to tgt. like ['', 'some', 'other', 'tokens'] Same with suffix transform.
I see, can you PR?
In prefix transform I see the code to prepend prefix:
Looks like it prepends prefix even if it's empty. I think it should be like: