anthropics / anthropic-tokenizer-typescript

MIT License
47 stars 1 forks source link

fix: normalisation improvements #2

Closed RobertCraigie closed 1 year ago

RobertCraigie commented 1 year ago

Previously we would not normalise the input text before encoding which resulted in discrepancies for certain special characters, e.g. ™. These cases should now match the Python SDK.

This PR also: