-
Hello,
I would first thank you for open-sourcing such a well-designed and high-quality code base.
I am reading the source code, and I have a question about this part(integrations.transformers.py…
-
## Summary
The current implementation of `Tokenizer` is slower than html5ever. I believe that it can become as fast as html5ever.
## The current benchmark results
From #59
Running on the M…
kkebo updated
3 weeks ago
-
I was try to use this module but i am getting this type of error
`AttributeError: 'Tokenizer' object has no attribute 'tokenizer' `
![image](https://github.com/jumon/whisper-punctuator/assets/737480…
-
### System Info
- `transformers` version: 4.35.2
- Platform: Linux-5.4.0-163-generic-x86_64-with-glibc2.10
- Python version: 3.8.18
- Huggingface_hub version: 0.19.4
- Safetensors version: 0.4.…
-
Can't load tokenizer for 'openai/clip-vit-large-patch14'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, …
-
Excellent work!
It seems that NLTokenizer split text but ignored the sign char .
example:emt-lib-path,the out put is emt, -lib ,-path
but expect is emt ,-,lib,-,path
Did there have any method to c…
-
Hi, Im trying to export my tokenizer, and followed this short guide:
[Guide](https://onnxruntime.ai/docs/extensions/
)
Now, using:
tokenizer = AutoTokenizer.from_pretrained(onnx_path, use_fast=Fa…
-
Thank you for your repo!
Would it be possible to add a tokenizer for this model?
https://huggingface.co/microsoft/mdeberta-v3-base
Thanks in advance :)
-
datasets/utils.py文件中,关于qwen2的bug,**QWen2Tokenizer** 应该改为 **Qwen2Tokenizer**
`def get_bos_eos_token_ids(tokenizer):
if tokenizer.__class__.__name__ in [
'QWenTokenizer', 'QWen2Tok…
-
### What happened?
Despite [phi-2](https://huggingface.co/microsoft/phi-2) being listed [here](https://github.com/ggerganov/llama.cpp/blob/2e32f874e675f7bc5307cb7b4470ddbe090bab8f/README.md?plain=1#L…