-
### System Info
TypeScript 5.5.4
transformers.js 3.0.2
Node.js v20.170
### Environment/Platform
- [X] Website/web-app
- [ ] Browser extension
- [X] Server-side (e.g., Node.js, Deno, Bun)
- [ ] De…
-
### Your current environment
The output of `python collect_env.py`
```text
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A…
-
From Twitter - adding new tokens to Qwen don't work?
```python
# Add special tokens to the tokenizer
num_added_tokens = tokenizer.add_special_tokens({"additional_special_tokens": special_tokens})
…
-
## Purpose
The script `tokviz/visualization.py` can and should have functionality to **visualize custom and local tokenizers**. Start with HF Transformers' class `PreTrainedTokenizerFast` for ease. T…
-
When following the README instructions on Ubuntu 20.04 on Windows 11 (WSL2), the `make` command fails:
```bash
[ 7%] Built target ggml
[ 8%] Generating release/libtokenizers_c.a
no such file o…
-
Port CLIP tokenizer which leverages byte-level BPE. This tokenizer enables scenarios like StableDiffusion
May be dependent on https://github.com/dotnet/machinelearning/issues/6992.
Reference:
h…
-
我的环境是 Windows 11 23H2,Anaconda 24.5.0
我按照 README 中的步骤执行的以下命令:
```
git clone https://github.com/Ucas-HaoranWei/GOT-OCR2.0.git
cd GOT-OCR2.0/GOT-OCR2.0-master/
conda create -n got python=3.10 -…
-
Hello. I think there are some problems with `NormalizedString` (tokenizers 0.15.2).
In the following example, `append()` works as expected.
```
from tokenizers import NormalizedString
s = Norm…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Environment
```markdown
- Milvus version:master
- Deployment mode(standalone or cluster):
- MQ type(rocksmq,…
-
### Project URL
https://pypi.org/project/tokenizers/
### Does this project already exist?
- [X] Yes
### New limit
100 GB (to start)
### Update issue title
- [X] I have updated the title.
### W…