Open francoishernandez opened 3 weeks ago
This PR is an attempt at facilitating configuration of special tokens, and working with some specificities.
Two main changes :
{bos,eos,unk,pad}_token
BaseVocabConfig
"specials"
optional_eos
PredictConfig
<|end_of_text|>
Some open questions / TODOs:
default_specials
This PR is an attempt at facilitating configuration of special tokens, and working with some specificities.
Two main changes :
{bos,eos,unk,pad}_token
fields inBaseVocabConfig
, which are then stored in a"specials"
key in the vocab object;optional_eos
inPredictConfig
to handle cases where we might need several EOS, e.g. Llama3 with<|end_of_text|>
and `<|eot_id|>``Some open questions / TODOs:
default_specials
field, we should probably deprecate it in favor of the more flexible new fields ;