-
Hi, thank you for sharing this repository.
It seems that the data generation module is missing, thus I fail to generate a "train_data.txt" for pretraining.
Could you please provide the data gener…
-
### ⚠️ Please check that this feature request hasn't been suggested before.
- [X] I searched previous [Ideas in Discussions](https://github.com/OpenAccess-AI-Collective/axolotl/discussions/categories…
-
In log file, you can see the "do_pretraining : False"
Can someone explain How to use this option in configuration file ?
Thanks !
-
Interesting project, but I have some concern on the language.
As is known that there are less Chinese tokens in the training data of Llama, and each Chinese token is tokenized into several tokens w…
-
According to the paper "Greedy Layer-Wise Training of Deep Networks", 2006, each layer of the autoencoder should be trained greedily in a purely unsupervised way.
To simply put,
- trained one la…
-
When I pretrained a 3-layer BERT model using GluonNLP 0.10 on one p3.24dn instance with 32GB GPU memory, I received `CUDA: Check failed: e == cudaSuccess: misaligned address`. With batch size 128 in …
-
Trying to run the training for the BERT-large topology, unpadded. We set up an nvidia-docker to run the training workload. However, we run into an error for the unpadded run. Here's an excerpt from th…
-
albert_tiny 使用自己的数据 run_pretraining.py 做fine-tune,然后下游任务训练出的模型,f1提升4%,但预测是时间为原来的三倍,是哪里的参数不对,还是啥原因?大神
-
**Is your feature request related to a problem? Please describe.**
When the model config has `rampup_batch_size`, we will have model loading errors when the global_batch_size is not set accordingly…
-
Hello,
I was wondering if it is straightforward to bring older models such as GPT-2 to lit-gpt.
If so, what files/configs do I need to change?
Thank you!