Closed petergoldstein closed 1 year ago
This PR adds the following methods to Tokenizer:
and the following methods to Encoding:
The Python Tokenizer and Encoding bindings were used as a reference.
There are some additional updates that I'd like to make as a follow up. Most notably:
encode
pair
But this batch seemed pretty straightforward on its own and of reasonable benefit.
Awesome, thanks again @petergoldstein! The follow up changes sound good as well.
This PR adds the following methods to Tokenizer:
and the following methods to Encoding:
The Python Tokenizer and Encoding bindings were used as a reference.
There are some additional updates that I'd like to make as a follow up. Most notably:
encode
method to better match the complete signature here, including support forpair
and pretokenizationBut this batch seemed pretty straightforward on its own and of reasonable benefit.