WadeYin9712 / SentiBERT

Code for SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics (ACL'2020).
76 stars 15 forks source link

The pretraining for generated epoch is taking a LONG time! #4

Closed ParadoxTwo closed 4 years ago

ParadoxTwo commented 4 years ago

During pretraining, the epoch was killed. I assumed it was the RAM consumption, so i tried to increase swap memory. I tried it again but it seems like it is taking a really long time, even after connecting to the GPU. The result shown:

CUDA_VISIBLE_DEVICES=0 python3 finetune_on_pregenerated_sstphrase.py \

--pregenerated_data /home/oem/Documents/SentiBERT/training_sstphrase \ --bert_model bert-base-uncased \ --do_lower_case \ --output_dir /home/oem/Documents/SentiBERT/results/sstphrase_pretrain \ --epochs 3 2020-09-11 21:05:33,356: device: cpu n_gpu: 0, distributed training: False, 16-bits training: False 2020-09-11 21:05:34,533: loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/oem/.cache/torch/pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084 2020-09-11 21:05:37,707: loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin from cache at /home/oem/.cache/torch/pytorch_pretrained_bert/aa1ef1aede4482d0dbcd4d52baad8ae300e60902e88fcb0bebdec09afd232066.36ca03ab34a1a5d5fa7bc3d03d55c4fa650fed07220e2eeebc06ce58d0e9a157 2020-09-11 21:05:37,707: loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json from cache at /home/oem/.cache/torch/pytorch_pretrained_bert/4dad0251492946e18ac39290fcfe91b89d370fee250efe9521476438fe8ca185.7156163d5fdc189c3016baca0775ffce230789d7fa2a42ef516483e4ca884517 2020-09-11 21:05:37,708: Model config { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "type_vocab_size": 2, "vocab_size": 30522 }

2020-09-11 21:05:47,259: Weights of BertForPreTraining not initialized from pretrained model: ['bert.pooler_phrase.dense.weight', 'bert.pooler_phrase.dense.bias', 'bert.trans.weight', 'bert.trans.bias', 'bert.trans_2.weight', 'bert.trans_2.bias', 'bert.trans_3.weight', 'bert.trans_3.bias', 'bert.bahdanau_attention.linear_encoder.weight', 'bert.bahdanau_attention.linear_encoder.bias', 'bert.bahdanau_attention.linear_decoder.weight', 'bert.bahdanau_attention.linear_decoder.bias', 'bert.bahdanau_attention.linear_in.0.weight', 'bert.bahdanau_attention.linear_in.0.bias', 'bert.bahdanau_attention.linear_in.3.weight', 'bert.bahdanau_attention.linear_in.3.bias', 'bert.bahdanau_attention_3.linear_encoder.weight', 'bert.bahdanau_attention_3.linear_encoder.bias', 'bert.bahdanau_attention_3.linear_decoder.weight', 'bert.bahdanau_attention_3.linear_decoder.bias', 'bert.bahdanau_attention_3.linear_in.0.weight', 'bert.bahdanau_attention_3.linear_in.0.bias', 'bert.bahdanau_attention_3.linear_in.3.weight', 'bert.bahdanau_attention_3.linear_in.3.bias', 'bert.linear_out.0.weight', 'bert.linear_out.0.bias', 'bert.linear_out.3.weight', 'bert.linear_out.3.bias', 'bert.linear_out_3.0.weight', 'bert.linear_out_3.0.bias', 'bert.linear_out_3.3.weight', 'bert.linear_out_3.3.bias', 'bert.LayerNorm.weight', 'bert.LayerNorm.bias', 'cls.phrase_predictions.transform.dense.weight', 'cls.phrase_predictions.transform.dense.bias', 'cls.phrase_predictions.transform.LayerNorm.weight', 'cls.phrase_predictions.transform.LayerNorm.bias', 'cls.phrase_predictions.decoder.weight', 'cls.phrase_predictions.decoder.bias'] 2020-09-11 21:05:47,267: Running training 2020-09-11 21:05:47,267: Num examples = 25137 2020-09-11 21:05:47,267: Batch size = 32 2020-09-11 21:05:47,268: Num steps = 785 2020-09-11 21:05:47,539: Loading training examples for epoch 0 Training examples: 100%|██████████████████| 8379/8379 [01:00<00:00, 139.13it/s] 2020-09-11 21:06:56,683: Loading complete! Epoch 0: 0%| | 1/262 [18:56<82:19:17, 1135.47s/it, Loss: 4.97729]

Estimated time for each epoch is roughly 80+ hours. I am using a Gtx 1050ti with a nvidia 440 driver, 8GB RAM and 16GB of swap memory. The nvidia-smi is listed below:

nvidia-smi Fri Sep 11 19:21:59 2020
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 105... Off | 00000000:01:00.0 Off | N/A | | N/A 52C P0 N/A / N/A | 148MiB / 4042MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 992 G /usr/lib/xorg/Xorg 64MiB | | 0 1446 G /usr/bin/gnome-shell 83MiB | +-----------------------------------------------------------------------------+

Could you let me now how to improve timing? In a previoud issue, you had mentioned you were able to achieve minutes for each epoch.

nutsformers commented 4 years ago

Even I'm facing the same problem. I don't have Nvidia graphics card but I have an Intel GPU. I am getting the same issue!

WadeYin9712 commented 4 years ago

Hi! How much time will you run the normal Huggingface BERT model? I'm not sure if it's my code's problem or device problem.

ParadoxTwo commented 4 years ago

I don't think it is the code problem. I am using the pytorch_pretrained_bert.model_new/tokinzation/optimization and run_classifier_new.py. I feel it's that the GPU is not been accessed while running the program, even though everything is set up. Although looking at the result, I see that the weights of the BertforPretraining is not initialized.

ParadoxTwo commented 4 years ago

*From the pretrained model

WadeYin9712 commented 4 years ago

Perhaps. I've not found any problems in our machines and I also tried different sets of GPU cards. For Ntx 1080Ti, it will cost 5 mins; for Tesla K80, it will cost about 12mins.