-
Hello,
I have been trying to train a tokenizer following the code you provided, but I am encountering an out-of-memory issue. I'm working with a multi-species dataset that's several tens of GBs in …
-
Dear Sir, hello
Can NuMAD calculate the 6×6 stiffness matrix of a wind turbine blade section?
-
I used the official Docker image and downloaded the weight file from Meta. The md5sum test proved that the file was fine, but it still failed to run, which left me confused,I confirm that CUDA can be …
-
Hi again,
I ran this command in Section 2.2 and I get the error in the picture below. Could you please advise?
```
python preprocess.py \
--split dev \
--file_name exeds…
-
Facebook has recently released LASER - https://github.com/facebookresearch/LASER, a language-Agnostic sentence representations, that internally makes use of a relatively new C++ BPE implementation - h…
-
Hello, I am trying to run a zero-shot VQA script but it cannot find the BPE directory, even when I explicitly set it to GPT2.
Here is my inference script:
`export MASTER_PORT=1091
user_dir=../…
-
```python
import torch
from mingpt.bpe import BPETokenizer
tokenizer = BPETokenizer()
print(tokenizer("")) # tensor([[ 27, 91, 437, 1659, 5239, 91, 29]])
print(tokenizer.decode(torch.te…
-
If not redirect it to my log file, it's all OK.
```
root@5b325f584bab:/data/project/# python -u ./src/tokenizing.py --vocab_size 20000
Processing: 2%|███▏ …
-
- [x] Compare (done in #1003) (but to be discussed)
- [ ] Hash (to be discussed)
-
**Is your feature request related to a problem? Please describe.**
I need to calculate the number of tokens, but TokenizerGpt3 has errors in calculations for models of GPT-3.5 and above.
Tokenizer…