ai-forever / ru-gpts

Russian GPT3 models.
Apache License 2.0
2.08k stars 444 forks source link

Не используется gpu #64

Closed viktor02 closed 3 years ago

viktor02 commented 3 years ago

При запуске предупреждение device=cpu

(D:\data\Other\GPT-3\gpt3env) > python pretrain_transformers.py --output_dir=../models/essays --model_type=gpt2 --model_name_or_path=../models/gpt2-large --do_train --train_data_file=train.txt --do_eval --eval_data_file=valid.txt --per_gpu_train_batch_size 1 --gradient_accumulation_steps 1 --num_train_epochs 5 --block_size 512 --overwrite_output_dir
2021-06-11 17:01:51.656591: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
06/11/2021 17:02:06 - WARNING - __main__ -   Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False

И скорость ~58.42s/it

CUDA 11.3.1 Windows 10 x64 Nvidia GTX 1080 Модель ruGPT3Large Что может быть не так?

Полный лог ``` (D:\data\Other\GPT-3\gpt3env) D:\data\Other\GPT-3\ru-gpts>python pretrain_transformers.py --output_dir=../models/essays --model_type=gpt2 --model_name_or_path=../models/gpt2-large --do_train --train_data_file=train.txt --do_eval --eval_data_file=valid.txt --per_gpu_train_batch_size 1 --gradient_accumulation_steps 1 --num_train_epochs 5 --block_size 512 --overwrite_output_dir 2021-06-11 17:26:32.123774: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll 06/11/2021 17:26:47 - WARNING - __main__ - Process rank: -1, device: cpu, n_gpu: 0, distributed training: False, 16-bits training: False 06/11/2021 17:26:47 - INFO - transformers.configuration_utils - loading configuration file ../models/gpt2-large\config.json 06/11/2021 17:26:47 - INFO - transformers.configuration_utils - Model config GPT2Config { "_num_labels": 2, "activation_function": "gelu_new", "architectures": [ "GPT2LMHeadModel" ], "attn_pdrop": 0.1, "bad_words_ids": null, "bos_token_id": 50256, "decoder_start_token_id": null, "do_sample": false, "early_stopping": false, "embd_pdrop": 0.1, "eos_token_id": 50256, "finetuning_task": null, "gradient_checkpointing": false, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "initializer_range": 0.02, "is_decoder": false, "is_encoder_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "layer_norm_epsilon": 1e-05, "length_penalty": 1.0, "max_length": 20, "min_length": 0, "model_type": "gpt2", "n_ctx": 2048, "n_embd": 1536, "n_head": 16, "n_inner": null, "n_layer": 24, "n_positions": 2048, "no_repeat_ngram_size": 0, "num_beams": 1, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_past": true, "pad_token_id": null, "prefix": null, "pruned_heads": {}, "repetition_penalty": 1.0, "resid_pdrop": 0.1, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "task_specific_params": null, "temperature": 1.0, "top_k": 50, "top_p": 1.0, "torchscript": false, "use_bfloat16": false, "vocab_size": 50257 } 06/11/2021 17:26:47 - INFO - transformers.configuration_utils - loading configuration file ../models/gpt2-large\config.json 06/11/2021 17:26:47 - INFO - transformers.configuration_utils - Model config GPT2Config { "_num_labels": 2, "activation_function": "gelu_new", "architectures": [ "GPT2LMHeadModel" ], "attn_pdrop": 0.1, "bad_words_ids": null, "bos_token_id": 50256, "decoder_start_token_id": null, "do_sample": false, "early_stopping": false, "embd_pdrop": 0.1, "eos_token_id": 50256, "finetuning_task": null, "gradient_checkpointing": false, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "initializer_range": 0.02, "is_decoder": false, "is_encoder_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "layer_norm_epsilon": 1e-05, "length_penalty": 1.0, "max_length": 20, "min_length": 0, "model_type": "gpt2", "n_ctx": 2048, "n_embd": 1536, "n_head": 16, "n_inner": null, "n_layer": 24, "n_positions": 2048, "no_repeat_ngram_size": 0, "num_beams": 1, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_past": true, "pad_token_id": null, "prefix": null, "pruned_heads": {}, "repetition_penalty": 1.0, "resid_pdrop": 0.1, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "task_specific_params": null, "temperature": 1.0, "top_k": 50, "top_p": 1.0, "torchscript": false, "use_bfloat16": false, "vocab_size": 50257 } 06/11/2021 17:26:47 - INFO - transformers.tokenization_utils - Model name '../models/gpt2-large' not found in model shortcut name list (gpt2, gpt2-medium, gpt2-large, gpt2-xl, distilgpt2). Assuming '../models/gpt2-large' is a path, a model identifier, or url to a directory containing tokenizer files. 06/11/2021 17:26:47 - INFO - transformers.tokenization_utils - Didn't find file ../models/gpt2-large\added_tokens.json. We won't load it. 06/11/2021 17:26:47 - INFO - transformers.tokenization_utils - Didn't find file ../models/gpt2-large\special_tokens_map.json. We won't load it. 06/11/2021 17:26:47 - INFO - transformers.tokenization_utils - Didn't find file ../models/gpt2-large\tokenizer_config.json. We won't load it. 06/11/2021 17:26:47 - INFO - transformers.tokenization_utils - loading file ../models/gpt2-large\vocab.json 06/11/2021 17:26:47 - INFO - transformers.tokenization_utils - loading file ../models/gpt2-large\merges.txt 06/11/2021 17:26:47 - INFO - transformers.tokenization_utils - loading file None 06/11/2021 17:26:47 - INFO - transformers.tokenization_utils - loading file None 06/11/2021 17:26:47 - INFO - transformers.tokenization_utils - loading file None 06/11/2021 17:26:47 - INFO - transformers.modeling_utils - loading weights file ../models/gpt2-large\pytorch_model.bin 06/11/2021 17:28:25 - INFO - transformers.modeling_utils - Weights from pretrained model not used in GPT2LMHeadModel: ['transformer.h.0.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.12.attn.masked_bias', 'transformer.h.13.attn.masked_bias', 'transformer.h.14.attn.masked_bias', 'transformer.h.15.attn.masked_bias', 'transformer.h.16.attn.masked_bias', 'transformer.h.17.attn.masked_bias', 'transformer.h.18.attn.masked_bias', 'transformer.h.19.attn.masked_bias', 'transformer.h.20.attn.masked_bias', 'transformer.h.21.attn.masked_bias', 'transformer.h.22.attn.masked_bias', 'transformer.h.23.attn.masked_bias'] 06/11/2021 17:28:25 - INFO - __main__ - Training/evaluation parameters Namespace(adam_epsilon=1e-08, block_size=512, cache_dir=None, config_name=None, device=device(type='cpu'), do_eval=True, do_train=True, eval_all_checkpoints=False, eval_data_file='valid.txt', evaluate_during_training=False, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, learning_rate=5e-05, line_by_line=False, local_rank=-1, logging_steps=500, max_grad_norm=1.0, max_steps=-1, mlm=False, mlm_probability=0.15, model_name_or_path='../models/gpt2-large', model_type='gpt2', n_gpu=0, no_cuda=False, num_train_epochs=5.0, output_dir='../models/essays', overwrite_cache=False, overwrite_output_dir=True, per_gpu_eval_batch_size=4, per_gpu_train_batch_size=1, save_steps=500, save_total_limit=None, seed=42, server_ip='', server_port='', should_continue=False, tokenizer_name=None, train_data_file='train.txt', warmup_steps=0, weight_decay=0.01) 06/11/2021 17:28:25 - INFO - __main__ - Loading features from cached file gpt2_cached_lm_512_train.txt 06/11/2021 17:28:26 - INFO - __main__ - ***** Running training ***** 06/11/2021 17:28:26 - INFO - __main__ - Num examples = 16917 06/11/2021 17:28:26 - INFO - __main__ - Num Epochs = 5 06/11/2021 17:28:26 - INFO - __main__ - Instantaneous batch size per GPU = 1 06/11/2021 17:28:26 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 1 06/11/2021 17:28:26 - INFO - __main__ - Gradient Accumulation steps = 1 06/11/2021 17:28:26 - INFO - __main__ - Total optimization steps = 84585 06/11/2021 17:28:26 - INFO - __main__ - Starting fine-tuning. Epoch: 0%| | 0/5 [00:00 main() File "pretrain_transformers.py", line 731, in main global_step, tr_loss = train(args, train_dataset, model, tokenizer) File "pretrain_transformers.py", line 320, in train outputs = model(inputs, masked_lm_labels=labels) if args.mlm else model(inputs, labels=labels) File "D:\data\Other\GPT-3\gpt3env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "D:\data\Other\GPT-3\gpt3env\lib\site-packages\transformers\modeling_gpt2.py", line 599, in forward inputs_embeds=inputs_embeds, File "D:\data\Other\GPT-3\gpt3env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "D:\data\Other\GPT-3\gpt3env\lib\site-packages\transformers\modeling_gpt2.py", line 484, in forward hidden_states, layer_past=layer_past, attention_mask=attention_mask, head_mask=head_mask[i] File "D:\data\Other\GPT-3\gpt3env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "D:\data\Other\GPT-3\gpt3env\lib\site-packages\transformers\modeling_gpt2.py", line 231, in forward m = self.mlp(self.ln_2(x)) File "D:\data\Other\GPT-3\gpt3env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "D:\data\Other\GPT-3\gpt3env\lib\site-packages\transformers\modeling_gpt2.py", line 211, in forward h2 = self.c_proj(h) File "D:\data\Other\GPT-3\gpt3env\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "D:\data\Other\GPT-3\gpt3env\lib\site-packages\transformers\modeling_utils.py", line 1591, in forward x = torch.addmm(self.bias, x.view(-1, x.size(-1)), self.weight) KeyboardInterrupt ```
viktor02 commented 3 years ago

Был поставлен torch без поддержки cuda

# CUDA 11.0
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html