-
安装成功paddlehub import paddlehub 提示缺少 sentencepiece 需要手动安装依赖包
-
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\langchain\ragtest.py", line 108, in
embeddings = HuggingFaceEmbeddings(mode…
-
Hello,
I'm having a look if I can use this library in order to later on build an R wrapper around it, as this setup seems to be the only software providing some functionalities similar to UDPipe 2.…
-
Hi guys,
I trained from scratch a new sentencepiece model on my pretraining dataset, however I still get unk tokens. Do you know why? I remember the last summer was working smoothly!
Specifically:…
ghost updated
3 years ago
-
HI, I am trying to train this model but I have some issues when the model run Vicuna model:
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data()…
-
Could you, please, explain if it is possible to initialize the sentencepiece algorithm on a pre-defined vocabulary. If it is not, it seems that would be a really useful option.
-
### Describe the bug
I tried to add input_ids to dataset with map(), and I used the return_tensors='pt', but why I got the callback with the type of List?
![image](https://github.com/user-attachment…
-
### System Info / 系統信息
I'm getting the following error when installing the dependencies for GLM-4V-9B. What could be the reason?
ERROR: pip's dependency resolver does not currently take into accou…
-
## 🚀 Feature
**Motivation**
The current method of training a sentencepiece model requires a file to be passed. It would be nice if this was not required.
**Pitch**
Like other data-relate…
-
Using sentencepiece 0.1.99 in python 3.11.10, an out of range may cause crashes depending on which other valid inputs are part of the batch:
```
>>> tkn.load(str(Path("gemma2-9b") / "tokenizer.mod…