-
Hi! When I pretrain Bert, I get the following error every time at the validation of the first epoch(as shown in the first picture). I want to train for `3000` epochs, when I set` --val_freq 3001`, the…
-
I would like to request a new feature in the code: the ability to resume training from a checkpoint.
Currently, the code can save a checkpoint of the model's state at any point during training. How…
-
we are still waiting for pretraining script, we don't want to download datasets. (it's the whole point of pretraining).
we need a demo to test with our own custom images/data.
Thank you
-
How can I solve the below problem?
########## Train Log ###############
mot/crowdhuman_dla34| train: [1][4240/19370]|Tot: 0:28:23 |ETA: 0:00:00 |loss 20.8643 |hm_loss 1.6064 |wh_loss 4.8034 |off_l…
-
### Question
Does anyone have carried out the pretraining with Mixtral 8×7B? When I run the petraining script, one problem occured like the figure shown below. I just add a llava_mixtral.py to the ll…
-
We have seen TF2 Albert pretraining crashes intermittently every 1 out of ~3 runs using latest Horovod training on 8 nodes; the crash happens around 3000 steps
Error message:
```
Loss: 6.436, MLM…
-
Hi,
I'm doing as you wrote in the readme file
Now when I'm running:
` bash run/vqa_finetune.bash 0 vqa_lxr955_tiny --tiny`
I get the following error:
> Load 632117 data from split(s) train,…
-
## 一言でいうと
多言語対応の文表現を得る際、どんなタスクが良いのか検証した研究。ベースは言語モデルで、通常通り次の単語を予測する(Causal LM)、単語をdropした箇所を予測する(Masked LM)+翻訳データがある場合に、並べた文でMasked LMを行うTranslation LMの計3つを提案。CLM
-
The model checkpoints seem to be hard-coded as the BertForMaskedLM and are unable to load but to the CoCondensor class.
Adding the following attributes in the initialization can surpass the exceptio…
-
While trying out INT8 mixed precision pretraining (#748) with torchtitan, I came across an issue that if the model is FSDP-sharded, `quantize_()` won't work. The fix would be adding an extra logic to …