Hi @amritasaha1812, I was wondering if you could help me with some training details.
1) For how many epochs must we train the model?
2) How much time does each epoch take? It's been about 12 hours since I put it to train and I still waiting for it to complete the second epoch. I am using two Nvidia 1080 Ti GPUs. I am observing that GPU usage is low most of the time and CPU usage is 100%. Is there a way to somehow use multiple CPUs and increase the training speed?
Since this was trained long ago, I don't remember exactly. We train for 3 weeks approx and for us, one epoch took > 24 hrs
You seem to have access to better hardware, and your GPU usage is low (as you say), so you could increase the batch size (given the constraints of GPU memory). CPU memory is mainly used for storing the wikidata dicts, and should not increase with increasing the batch size. The model training takes place in GPU, so I don't see any boost with more CPUs.
Hi @amritasaha1812, I was wondering if you could help me with some training details.
1) For how many epochs must we train the model?
2) How much time does each epoch take? It's been about 12 hours since I put it to train and I still waiting for it to complete the second epoch. I am using two Nvidia 1080 Ti GPUs. I am observing that GPU usage is low most of the time and CPU usage is 100%. Is there a way to somehow use multiple CPUs and increase the training speed?
Thanks,