-
I've exported `openai/clip-vit-base-patch32` from HuggingFace into a single op ONNX model which uses `CLIPTokenizer`. When comparing the behaviour to the original HF tokenizer I'm seeing an issue with…
-
您在掩码中文词时的实现如下:
```
for index in index_set:
covered_indexes.add(index)
masked_token = None
# 80% of the time, replace with [MASK]
if rng.random() < 0.8:
mas…
-
### 软件环境
```Markdown
windows10 X64 anaconda-spyder5.2.2
- paddlepaddle-gpu: 2.3.2
- paddlenlp: 2.0.1
```
### 重复问题
- [x] I have searched the existing issues
### 错误描述
```Markdown
只要一进入调试模式就会出现Ty…
-
今天解决了python -m bitsandbytes的问题,随之而来的就是新报错:
PS F:\新建文件夹> python .\Llama2-Chinese\examples\chat_gradio.py --model_name_or_path .\Llama2-Chinese-7b-Chat\
bin C:\Users\46045\AppData\Local\Programs\Pyt…
-
working on a site for people to log their Chinese progress (videos watched, articles read, etc). Rather than create my own plugin, I thought it would make more sense to build on top of yours (which I …
-
Traceback (most recent call last):
File "E:\yan\chong\daimaDemo\Flat-Lattice-Transformer-master\Flat-Lattice-Transformer-master\V0\flat_main.py", line 290, in
datasets, vocabs, embeddings = e…
-
修改了三处地方:
1、model\chinese-bert_chinese_wwm_pytorch\config.json, 其中vocab_size的值改为 30522
2、code\sqlnet\model\sqlbert.py,大约141行附近,增加三行:sel_col_mask = sel_col_mask - 254;where_col_mask = where_col_m…
-
Hi, I have some questions about pre-training as follows:
1. I wanna train my own model from scratch and produce the `vocab.txt` by characters. There are some low-frequency words, should low-frequenc…
-
您在
[emo_is_all_you_need](https://github.com/growvv/emo_is_all_you_need)
中分享的
存档
预训练模型: hfl_chinese_roberta_wwm_ext(修改了vocab.txt)
的百度网盘过期了,请问您能在上传一遍吗?我最近在学习您的代码,十分感谢!
-
Hi there!
I need to remove specific tokens (certain Chinese tokens) from the Qwen2Tokenizer, and I am not quite sure how to do so. I have tried various methods, shown below, but to no avail.
## …