-
File "/qiuwkai27/cx/baby-llama2-chinese/sft.py", line 274, in
tokenizer=ChatGLMTokenizer(vocab_file='./chatglm_tokenizer/tokenizer.model')
File "/qiuwkai27/cx/baby-llama2-chinese/chatglm_to…
-
In the [README](https://github.com/fastnlp/CPT/blob/master/pretrain/README.md) of pre-training, it mentions that the `dataset`, `vocab` and `roberta_zh` have to be prepared before training.
Is ther…
-
我的代码来自 https://huggingface.co/IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese
from transformers import PegasusForConditionalGeneration
from tokenizers_pegasus import PegasusTokenizer
model = Peg…
-
我试了DUIE中文数据集代码能够跑通,用的chinese-bert-wwm-ext的预训练模型和中文vocab,但是换到bert-base-cased模型和英文vocab在英文数据集上训练时,就报错下面的错误,不知道啥原因,关系数目也改了,是代码中还有需要修改的地方吗,还是用源代码才能跑?
![LYBLK2ZDJSP0`32K(_R JM6](https://github.com/user-at…
-
I am writing to ask for your help with a problem I am having with the tokenizer. I have been trying to solve it for a while now, but I have been unsuccessful.
However, I am having trouble with : Trac…
-
While catching-up with this summer's discussion, I've looked at https://deploy-preview-638--linked-art.netlify.app/model/vocab/recommended/#languages and am realizing that we have visible bias issues …
-
我要做的项目是要求用roberta的,请问各位大佬我可以直接把chinese-roberta-wwm-ext对应的模型和vocab放在 bert_pretain目录下然后直接训练吗?
-
#### Problem description
A gensim model was trained under Python 2.7 with a **chinese** dataset.
However, now we are using Python3.6, and we got some broken strings in .vocab.keys() as title.
…
-
May I ask if there is any experiment on Chinese data set? Why I use pinyin as phoneme training on Chinese Mandarin data set, and what I synthesize is all noise?
-
convert Chinese clip model fail.
https://huggingface.co/OFA-Sys/chinese-clip-vit-base-patch16/tree/main