-
### Question
This is probably a known issue, as I'm aware that this project lags a bit behind the fast changes being made in the python transformers library, but I wanted to document a specific com…
-
Mail addresses in content such as my.name@example.org are split by the StandardTokenizerFactory as "my.name" and "example.org" because according to http://unicode.org/reports/tr29/#Word_Boundaries cer…
-
Just upgraded to the new version and seeing this error in the log files:
[7/15/2024, 8:03:34 AM] [homebridge-midea-platform] [Mini Split] Does not supports the protocol MessageQuerySubtype, ignor…
-
**Github username:** --
**Twitter username:** --
**Submission hash (on-chain):** 0xac22b432f475565c9ca28e80fa970c0b30d912cf334aec73ee9b521fe07c2129
**Severity:** medium
**Description:**
## Details
T…
-
Hi authors,
I am trying to recreate the temporal dataset you used in your paper. I noticed in your preprocessing folder under the 'create_temporal_dataset.ipynb', that you used **'master.csv'** fil…
-
### 是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
### 该问题是否在FAQ中有解答? | Is there an existing ans…
-
### Feature request
1. [run_ner.py in examples](https://github.com/huggingface/transformers/blob/main/examples/pytorch/token-classification/run_ner.py) are requiring data of pre-tokenized words, like…
-
你好,我看了下代码,把代码的tokenizer换成了中文的jieba分词器,但是生成结果非常低,请问要怎么修改代码?需要改哪些内容呢?能否指导一下呢
-
I am trying to train a custom tokenizer. My use case is related to assembly code, so I want merges to be possible across full instructions (potentially multiple "words"). To do this, I am replacing al…
-
### System Info
Node.js v22.9.0. `"@xenova/transformers": "2.17.2"`
### Environment/Platform
- [ ] Website/web-app
- [ ] Browser extension
- [X] Server-side (e.g., Node.js, Deno, Bun)
- [ ] Desktop…