Closed krisbianprabowo closed 1 year ago
Hi @krisbianprabowo, thank you for your interest in our work. Before we pre-train the model using a TPU pod (). If you want to run 256 batch size and you don't have enough memory, you can try to use gradient accumulation to collect the gradient across smaller batches. You can further check section 4.2 in our IndoNLU paper for more detail on the pre-training
Nevertheless, I would not suggest running pre-training from scratch using a single GPU, because it will take quite some time (probably a week or two for a single run). I would instead suggest running DAPT or TAPT, which run a second phase pre-training using the existing pre-trained LM, you can check some of the references as follow:
Thank you so much for your detailed explanation @SamuelCahyawijaya, really appreciate it!
I'm sorry if I'm re-opening this issue again. Just to make it clear, so you guys are using cloud computing for pre-training from scratch using this kind of configuration which I showed it below. Am I right?
hi, I'm actually using one of your models for text similarity and it works great!
I wonder if I would like to pretrain the model from scratch using the Indo4B dataset, with that such a huge size (~24GB). May I know how many RAM and VRAM are needed to be able to train it with the same batch size you guys stated in your paper? i.e. for IndoBERTBASE was using 256 Batch size. Is 16 GB of VRAM and 32GB RAM enough?
Thank you for this such amazing work!