-
I attached my training loss below, the data we are using refers to LLM360's paper, we use less data starcode.
For each training epoch our data contains 30B arxiv , Book 57B, C4 197.67B, Refined-Web 6…
-
I am trying to run the pretraining scripts and encountering the following error while loading the datasets from disk.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU core…
-
I was wondering what sequence length was used during pretraining for the 1.2 and 2.4B model?
-
I was following [DATA.md](https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/DATA.md) to download pretraining dataset.
However, I cannot find `webvid_10m_train.json`, `cc12m_train.json…
-
Congrats for the great work! I got questions about the model pretraining.
I found that although this work emphasize the importance of how R-SMILES can boost the retrosynthesis prediction performanc…
-
** Environment **
composer = 0.23.3, composer = 0.17.2
GPU Stack: 8 x A100 80GB
CUDA: 12.1
** To reproduce
Steps to reproduce the behavior:
1. Run MosaicBERT using https://github.com/Skyli…
-
The pretraining example with
```
litgpt pretrain \
--model_name pythia-14m \
--config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/pretrain/debug.yaml
```
is do…
-
A colleague of mine made the comment I should pretraing the models to yield more robust models and
better accuracy. Now... how can I do that - or ... what are possible avenues here?
My models are …
-
After i trained,i put the .tar at vit_load_path.But i get the missing key error when i want to segment other data(like this:Missing key(s) in state_dict: "image_encoder.pos_embed", "image_encoder.patc…
-
Do you have a pre-training model? I want to save time on training.
And what about your training hours with the epoch=100.