-
I followed your BERT pretraining. However, after one week of training, the loss is still around 7.3. I use 8 GPU with 14 per batch. The rest is same as default.
INFO - 08/01/19 14:20:20 - 2:02:04 -…
-
Hi, what do I need to change the code if I want to parallelize the computation with 8 gpu's
-
The `pretrain.py` script lists the Alpaca dataset and all other finetuning datasets, but I don't think they are supported for finetuning.
E.g.,
```bash
python litgpt/pretrain.py \
--data lit…
rasbt updated
6 months ago
-
Could you perhaps provide the pretrained weights (model.pkl)?
If not, how long does pretraining approximately take?
-
Hi, Thank you for your work.
When running wiki103 gpt2 baseline and corresponding InRank pretraining experiments on 4*A10,
the final evaluation performance of PPL is a bit higher. Does the running m…
-
We can:
1. enable continued pretraining by not requiring model initialization to load a hf checkpoint
2. try to force the GPU to partition the computation correctly somehow?
3. not using scanlay…
dlwh updated
7 months ago
-
## 🐛 Bug
### To Reproduce
Steps to reproduce the behavior (**always include the command you ran**):
1. Run `fairseq-preprocess` with a `--srcdict` that has `#fairseq:overwrite`. For examp…
-
Anygpt is trained only with the Next Token Prediction task.
Take text to image as an example,Is the training input speech tokens text tokens image tokens music tokens?
I want to know the input…
-
How do I understand $`E_{Sam}`$ and the corresponding $`E^{T-I}_{Sam}`$ in the paper? Is it constructed using the positional embedding in the transformer like the learnable embedding $`E_{Pos}`$ etc. …
-
When I compile the model, I get the following error. Any idea how to fix this?
```
Traceback (most recent call last):
File "/home/user/anaconda3/envs/hydra/lib/python3.10/site-packages/IPython/…