Bump the minor group with 6 updates

Bumps the minor group with 6 updates:

Package	From	To
tokenizers	`0.19.1`	`0.20.0`
clap	`4.5.13`	`4.5.14`
clap_builder	`4.5.13`	`4.5.14`
filetime	`0.2.23`	`0.2.24`
redox_syscall	`0.4.1`	`0.5.2`
ureq	`2.10.0`	`2.10.1`

Updates tokenizers from 0.19.1 to 0.20.0

Release notes

Sourced from tokenizers's releases.

Release v0.20.0: faster encode, better python support

Release v0.20.0

This release is focused on performances and user experience.

Performances:

First off, we did a bit of benchmarking, and found some place for improvement for us! With a few minor changes (mostly #1587) here is what we get on Llama3 running on a g6 instances on AWS https://github.com/huggingface/tokenizers/blob/main/bindings/python/benches/test_tiktoken.py :

Python API

We shipped better deserialization errors in general, and support for __str__ and __repr__ for all the object. This allows for a lot easier debugging see this:
>>> from tokenizers import Tokenizer;
>>> tokenizer = Tokenizer.from_pretrained("bert-base-uncased");
>>> print(tokenizer)
Tokenizer(version="1.0", truncation=None, padding=None, added_tokens=[{"id":0, "content":"[PAD]", "single_word":False, "lstrip":False, "rstrip":False, ...}, {"id":100, "content":"[UNK]", "single_word":False, "lstrip":False, "rstrip":False, ...}, {"id":101, "content":"[CLS]", "single_word":False, "lstrip":False, "rstrip":False, ...}, {"id":102, "content":"[SEP]", "single_word":False, "lstrip":False, "rstrip":False, ...}, {"id":103, "content":"[MASK]", "single_word":False, "lstrip":False, "rstrip":False, ...}], normalizer=BertNormalizer(clean_text=True, handle_chinese_chars=True, strip_accents=None, lowercase=True), pre_tokenizer=BertPreTokenizer(), post_processor=TemplateProcessing(single=[SpecialToken(id="[CLS]", type_id=0), Sequence(id=A, type_id=0), SpecialToken(id="[SEP]", type_id=0)], pair=[SpecialToken(id="[CLS]", type_id=0), Sequence(id=A, type_id=0), SpecialToken(id="[SEP]", type_id=0), Sequence(id=B, type_id=1), SpecialToken(id="[SEP]", type_id=1)], special_tokens={"[CLS]":SpecialToken(id="[CLS]", ids=[101], tokens=["[CLS]"]), "[SEP]":SpecialToken(id="[SEP]", ids=[102], tokens=["[SEP]"])}), decoder=WordPiece(prefix="##", cleanup=True), model=WordPiece(unk_token="[UNK]", continuing_subword_prefix="##", max_input_chars_per_word=100, vocab={"[PAD]":0, "[unused0]":1, "[unused1]":2, "[unused2]":3, "[unused3]":4, ...}))
>>> tokenizer
Tokenizer(version="1.0", truncation=None, padding=None, added_tokens=[{"id":0, "content":"[PAD]", "single_word":False, "lstrip":False, "rstrip":False, "normalized":False, "special":True}, {"id":100, "content":"[UNK]", "single_word":False, "lstrip":False, "rstrip":False, "normalized":False, "special":True}, {"id":101, "content":"[CLS]", "single_word":False, "lstrip":False, "rstrip":False, "normalized":False, "special":True}, {"id":102, "content":"[SEP]", "single_word":False, "lstrip":False, "rstrip":False, "normalized":False, "special":True}, {"id":103, "content":"[MASK]", "single_word":False, "lstrip":False, "rstrip":False, "normalized":False, "special":True}], normalizer=BertNormalizer(clean_text=True, handle_chinese_chars=True, strip_accents=None, lowercase=True), pre_tokenizer=BertPreTokenizer(), post_processor=TemplateProcessing(single=[SpecialToken(id="[CLS]", type_id=0), Sequence(id=A, type_id=0), SpecialToken(id="[SEP]", type_id=0)], pair=[SpecialToken(id="[CLS]", type_id=0), Sequence(id=A, type_id=0), SpecialToken(id="[SEP]", type_id=0), Sequence(id=B, type_id=1), SpecialToken(id="[SEP]", type_id=1)], special_tokens={"[CLS]":SpecialToken(id="[CLS]", ids=[101], tokens=["[CLS]"]), "[SEP]":SpecialToken(id="[SEP]", ids=[102], tokens=["[SEP]"])}), decoder=WordPiece(prefix="##", cleanup=True), model=WordPiece(unk_token="[UNK]", continuing_subword_prefix="##", max_input_chars_per_word=100, vocab={"[PAD]":0, "[unused0]":1, "[unused1]":2, ...}))
The pre_tokenizer.Sequence and normalizer.Sequence are also more accessible now:
from tokenizers import normalizers
norm = normalizers.Sequence([normalizers.Strip(), normalizers.BertNormalizer()])
norm[0]
norm[1].lowercase=False
What's Changed

remove enforcement of non special when adding tokens by @ArthurZucker in huggingface/tokenizers#1521

[BREAKING CHANGE] Ignore added_tokens (both special and not) in the decoder by @Narsil in huggingface/tokenizers#1513

Make USED_PARALLELISM atomic by @nathaniel-daniel in huggingface/tokenizers#1532

Fixing for clippy 1.78 by @Narsil in huggingface/tokenizers#1548

feat(ci): add trufflehog secrets detection by @McPatate in huggingface/tokenizers#1551

Switch from cached_download to hf_hub_download in tests by @Wauplin in huggingface/tokenizers#1547

Fix "dictionnary" typo by @nprisbrey in huggingface/tokenizers#1511

make sure we don't warn on empty tokens by @ArthurZucker in huggingface/tokenizers#1554

Enable dropout = 0.0 as an equivalent to none in BPE by @mcognetta in huggingface/tokenizers#1550

Revert "[BREAKING CHANGE] Ignore added_tokens (both special and not) … by @ArthurZucker in huggingface/tokenizers#1569

Add bytelevel normalizer to fix decode when adding tokens to BPE by @ArthurZucker in huggingface/tokenizers#1555

Fix clippy + feature test management. by @Narsil in huggingface/tokenizers#1580

Bump spm_precompiled to 0.1.3 by @MikeIvanichev in huggingface/tokenizers#1571

Add benchmark vs tiktoken by @Narsil in huggingface/tokenizers#1582

Fixing the benchmark. by @Narsil in huggingface/tokenizers#1583

Tiny improvement by @Narsil in huggingface/tokenizers#1585

Enable fancy regex by @Narsil in huggingface/tokenizers#1586

Fixing release CI strict (taken from safetensors). by @Narsil in huggingface/tokenizers#1593

Adding some serialization testing around the wrapper. by @Narsil in huggingface/tokenizers#1594

... (truncated)

Commits

a5adaac version 0.20.0
a8def07 Merge branch 'fix_release' of github.com:huggingface/tokenizers into branch_v...
fe50673 Fix CI
b253835 push cargo
fc3bb76 update dependencies
bfd9cde Perf improvement 16% by removing offsets. (#1587)
bd27fa5 add deserialize for pre tokenizers (#1603)
56c9c70 Tests + Deserialization improvement for normalizers. (#1604)
49dafd7 Fix strip python type (#1602)
bded212 Support None to reset pre_tokenizers and normalizers, and index sequences (...
Additional commits viewable in compare view

Updates clap from 4.5.13 to 4.5.14

Release notes

Sourced from clap's releases.

v4.5.14

[4.5.14] - 2024-08-08

Features

(unstable-ext) Added Arg::add for attaching arbitrary state, like completion hints, to Arg without Arg knowing about it

Changelog

Sourced from clap's changelog.

[4.5.14] - 2024-08-08

Features

(unstable-ext) Added Arg::add for attaching arbitrary state, like completion hints, to Arg without Arg knowing about it

Commits

See full diff in compare view

Updates clap_builder from 4.5.13 to 4.5.14

Release notes

Sourced from clap_builder's releases.

v4.5.14

[4.5.14] - 2024-08-08

Features

(unstable-ext) Added Arg::add for attaching arbitrary state, like completion hints, to Arg without Arg knowing about it

Changelog

Sourced from clap_builder's changelog.

[4.5.14] - 2024-08-08

Features

(unstable-ext) Added Arg::add for attaching arbitrary state, like completion hints, to Arg without Arg knowing about it

Commits

d222ae4 chore: Release
a8abcb4 docs: Update changelog
2690e1b Merge pull request #5621 from shannmu/dynamic_valuehint
7fd7b3e feat(clap_complete): Support to complete custom value of argument
fc6aaca Merge pull request #5638 from epage/cargo
631e54b docs(cookbook): Style cargo plugin
6fb49d0 Merge pull request #5636 from gibfahn/styles_const
6f215ee refactor(styles): make styles example use a const
bbb2e6f test: Add test case for completing custom value of argument
999071c fix: Change visible to hidden
Additional commits viewable in compare view

Updates filetime from 0.2.23 to 0.2.24

Commits

256dfa6 Bump to 0.2.24
5ed97a1 Update windows-sys to 0.59 (#108)
10d8acf Replace redox_syscall with libredox. (#103)
See full diff in compare view

Updates redox_syscall from 0.4.1 to 0.5.2

Updates ureq from 2.10.0 to 2.10.1

Changelog

Sourced from ureq's changelog.

2.10.1

default ureq Rustls tls config updated to avoid panic for applications that activate the default Rustls aws-lc-rs feature without setting a process-wide crypto provider. ureq will now use *ring* in this circumstance instead of panicking.

Commits

6b62ef3 env_logger no rexex for MSRV
76cb671 2.10.1
5696f1b Cargo: bytes v1.6.0 -> v1.6.1
eae34f6 rtls: default_tls_config() explicit ring choice
7f6705b docs: add more Rustls detail to 2.10.0 CHANGELOG
See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself) - `@dependabot ignore minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself) - `@dependabot ignore ` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself) - `@dependabot unignore ` will remove all of the ignore conditions of the specified dependency - `@dependabot unignore ` will remove the ignore condition of the specified dependency and ignore conditions

benbrandt / text-splitter

Bump the minor group with 6 updates #317

Release v0.20.0: faster encode, better python support

Release v0.20.0

Performances:

Python API

What's Changed

v4.5.14

[4.5.14] - 2024-08-08

Features

[4.5.14] - 2024-08-08

Features

v4.5.14

[4.5.14] - 2024-08-08

Features

[4.5.14] - 2024-08-08

Features

2.10.1

Codecov Report