-
Hi,
i have issues with German Umlaute so i guess it depends on
const { WordTokenizer } = require('natural/lib/natural/tokenizers/regexp_tokenizer');
see classifier response
`{
"text": "wha…
sajov updated
6 years ago
-
D:\asdasd\AI\GPT-SoVITS-Server-main\GPT-SoVITS-Server-main>python server.py
DirectML可用,将使用DirectML进行推理加速。
设备名称: NVIDIA GeForce GTX 1650
Traceback (most recent call last):
File "D:\asdasd\AI\GPT-…
-
`it's` gets tokenized into three tokens, `it`, `'`, `s`
that should be fixed
same with `'m` `'t` etc
-
Message='BaiChuanTokenizer' object has no attribute 'sp_model'
Source=C:\Users\Administrator\.cache\huggingface\modules\transformers_modules\Sunsimiao\tokenization_baichuan.py
StackTrace:
…
-
Hi there!
How to get ' tokenizer.model' in 'tools/prepare_c4_megatron_llama3.py'?
-
我的代码来自 https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese
from transformers import PegasusForConditionalGeneration
from tokenizers_pegasus import PegasusTokenizer
model = Peg…
-
Hi! Is it a mistake? There should be 17 instead of 5 in the end.
![Снимок экрана 2022-07-08 в 17 41 45](https://user-images.githubusercontent.com/33065236/178014793-4c788364-c338-43e1-abb7-d33c7c4e5c…
-
1. Navigate to the "Tokenize Verse" feature in the GUI.
2. Select a verse and initiate the tokenization process.
3. Review the results.
Expected Behavior: The verse should be successfully tokenized…
-
Moved from https://github.com/balanced/balanced-php/issues/84
balanced.js is the preferred method of tokenization. Consider warning marketplaces who use the API direct tokenization method.
-
## Expected Behavior
Compound words (e.g. pick-me-up, hand-me-down, know-it-all, etc.) should be tokenized as single tokens.
## Actual Behavior
hyphens are treated as separators, and the componen…