Closed ParadoxTwo closed 4 years ago
Even I'm facing the same problem. I don't have Nvidia graphics card but I have an Intel GPU. I am getting the same issue!
Hi! How much time will you run the normal Huggingface BERT model? I'm not sure if it's my code's problem or device problem.
I don't think it is the code problem. I am using the pytorch_pretrained_bert.model_new/tokinzation/optimization and run_classifier_new.py. I feel it's that the GPU is not been accessed while running the program, even though everything is set up. Although looking at the result, I see that the weights of the BertforPretraining is not initialized.
*From the pretrained model
Perhaps. I've not found any problems in our machines and I also tried different sets of GPU cards. For Ntx 1080Ti, it will cost 5 mins; for Tesla K80, it will cost about 12mins.
During pretraining, the epoch was killed. I assumed it was the RAM consumption, so i tried to increase swap memory. I tried it again but it seems like it is taking a really long time, even after connecting to the GPU. The result shown:
CUDA_VISIBLE_DEVICES=0 python3 finetune_on_pregenerated_sstphrase.py \
2020-09-11 21:05:47,259: Weights of BertForPreTraining not initialized from pretrained model: ['bert.pooler_phrase.dense.weight', 'bert.pooler_phrase.dense.bias', 'bert.trans.weight', 'bert.trans.bias', 'bert.trans_2.weight', 'bert.trans_2.bias', 'bert.trans_3.weight', 'bert.trans_3.bias', 'bert.bahdanau_attention.linear_encoder.weight', 'bert.bahdanau_attention.linear_encoder.bias', 'bert.bahdanau_attention.linear_decoder.weight', 'bert.bahdanau_attention.linear_decoder.bias', 'bert.bahdanau_attention.linear_in.0.weight', 'bert.bahdanau_attention.linear_in.0.bias', 'bert.bahdanau_attention.linear_in.3.weight', 'bert.bahdanau_attention.linear_in.3.bias', 'bert.bahdanau_attention_3.linear_encoder.weight', 'bert.bahdanau_attention_3.linear_encoder.bias', 'bert.bahdanau_attention_3.linear_decoder.weight', 'bert.bahdanau_attention_3.linear_decoder.bias', 'bert.bahdanau_attention_3.linear_in.0.weight', 'bert.bahdanau_attention_3.linear_in.0.bias', 'bert.bahdanau_attention_3.linear_in.3.weight', 'bert.bahdanau_attention_3.linear_in.3.bias', 'bert.linear_out.0.weight', 'bert.linear_out.0.bias', 'bert.linear_out.3.weight', 'bert.linear_out.3.bias', 'bert.linear_out_3.0.weight', 'bert.linear_out_3.0.bias', 'bert.linear_out_3.3.weight', 'bert.linear_out_3.3.bias', 'bert.LayerNorm.weight', 'bert.LayerNorm.bias', 'cls.phrase_predictions.transform.dense.weight', 'cls.phrase_predictions.transform.dense.bias', 'cls.phrase_predictions.transform.LayerNorm.weight', 'cls.phrase_predictions.transform.LayerNorm.bias', 'cls.phrase_predictions.decoder.weight', 'cls.phrase_predictions.decoder.bias'] 2020-09-11 21:05:47,267: Running training 2020-09-11 21:05:47,267: Num examples = 25137 2020-09-11 21:05:47,267: Batch size = 32 2020-09-11 21:05:47,268: Num steps = 785 2020-09-11 21:05:47,539: Loading training examples for epoch 0 Training examples: 100%|██████████████████| 8379/8379 [01:00<00:00, 139.13it/s] 2020-09-11 21:06:56,683: Loading complete! Epoch 0: 0%| | 1/262 [18:56<82:19:17, 1135.47s/it, Loss: 4.97729]
Estimated time for each epoch is roughly 80+ hours. I am using a Gtx 1050ti with a nvidia 440 driver, 8GB RAM and 16GB of swap memory. The nvidia-smi is listed below:
nvidia-smi Fri Sep 11 19:21:59 2020
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 105... Off | 00000000:01:00.0 Off | N/A | | N/A 52C P0 N/A / N/A | 148MiB / 4042MiB | 0% Default | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 992 G /usr/lib/xorg/Xorg 64MiB | | 0 1446 G /usr/bin/gnome-shell 83MiB | +-----------------------------------------------------------------------------+
Could you let me now how to improve timing? In a previoud issue, you had mentioned you were able to achieve minutes for each epoch.