google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
37.84k stars 9.56k forks source link

clearify #948

Open rhl2k opened 4 years ago

rhl2k commented 4 years ago

12/04/2019 11:00:03 - WARNING - main - Process rank: -1, device: cuda, n_gpu: 1, distributed training: False, 16-bits training: False 12/04/2019 11:00:04 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-config.json from cache at /root/.cache/torch/transformers/6dfaed860471b03ab5b9acb6153bea82b6632fb9bbe514d3fff050fe1319ee6d.4c88e2dec8f8b017f319f6db2b157fee632c0860d9422e4851bd0d6999f9ce38 12/04/2019 11:00:04 - INFO - transformers.configuration_utils - Model config { "attention_probs_dropout_prob": 0.1, "finetuning_task": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 4096, "is_decoder": false, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "num_attention_heads": 16, "num_hidden_layers": 24, "num_labels": 2, "output_attentions": false, "output_hidden_states": false, "output_past": true, "pruned_heads": {}, "torchscript": false, "type_vocab_size": 2, "use_bfloat16": false, "vocab_size": 30522 }

12/04/2019 11:00:04 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /root/.cache/torch/transformers/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084 12/04/2019 11:00:05 - INFO - transformers.modeling_utils - loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-pytorch_model.bin from cache at /root/.cache/torch/transformers/54da47087cc86ce75324e4dc9bbb5f66c6e83a7c6bd23baea8b489acc8d09aa4.4d5343a4b979c4beeaadef17a0453d1bb183dd9b084f58b84c7cc781df343ae6 12/04/2019 11:00:19 - INFO - transformers.modeling_utils - Weights of BertForQuestionAnswering not initialized from pretrained model: ['qa_outputs.weight', 'qa_outputs.bias'] 12/04/2019 11:00:19 - INFO - transformers.modeling_utils - Weights from pretrained model not used in BertForQuestionAnswering: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias'] 12/04/2019 11:00:22 - INFO - main - Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir='', config_name='', device=device(type='cuda'), do_eval=True, do_lower_case=True, do_train=True, doc_stride=128, eval_all_checkpoints=False, evaluate_during_training=False, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, learning_rate=3e-05, local_rank=-1, logging_steps=50, max_answer_length=30, max_grad_norm=1.0, max_query_length=64, max_seq_length=384, max_steps=-1, model_name_or_path='bert-large-uncased', model_type='bert', n_best_size=20, n_gpu=1, no_cuda=False, null_score_diff_threshold=0.0, num_train_epochs=1.0, output_dir='model_check_points11_a/', overwrite_cache=False, overwrite_output_dir=False, per_gpu_eval_batch_size=8, per_gpu_train_batch_size=2, predict_file='dev_10.json', save_steps=20000, seed=42, server_ip='', server_port='', tokenizer_name='', train_file='2-dec_train.json', verbose_logging=False, version_2_with_negative=True, warmup_steps=0, weight_decay=0.0) 12/04/2019 11:00:22 - INFO - main - Loading features from cached file cached_train_bert-large-uncased_384 12/04/2019 11:00:44 - INFO - main - Running training 12/04/2019 11:00:44 - INFO - main - Num examples = 88641 12/04/2019 11:00:44 - INFO - main - Num Epochs = 1 12/04/2019 11:00:44 - INFO - main - Instantaneous batch size per GPU = 2 12/04/2019 11:00:44 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 2 12/04/2019 11:00:44 - INFO - main - Gradient Accumulation steps = 1 12/04/2019 11:00:44 - INFO - main - Total optimization steps = 44321

it is with max_seq_length 384 / my question is , what is Total optimization steps =44321 and what is Num examples = 88641

if i am decreasing sequence length to 256 12/04/2019 05:56:09 - WARNING - main - Process rank: -1, device: cuda, n_gpu: 1, distributed training: False, 16-bits training: False 12/04/2019 05:56:09 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-config.json from cache at /root/.cache/torch/transformers/6dfaed860471b03ab5b9acb6153bea82b6632fb9bbe514d3fff050fe1319ee6d.4c88e2dec8f8b017f319f6db2b157fee632c0860d9422e4851bd0d6999f9ce38 12/04/2019 05:56:09 - INFO - transformers.configuration_utils - Model config { "attention_probs_dropout_prob": 0.1, "finetuning_task": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 4096, "is_decoder": false, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "num_attention_heads": 16, "num_hidden_layers": 24, "num_labels": 2, "output_attentions": false, "output_hidden_states": false, "output_past": true, "pruned_heads": {}, "torchscript": false, "type_vocab_size": 2, "use_bfloat16": false, "vocab_size": 30522 }

12/04/2019 05:56:10 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /root/.cache/torch/transformers/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084 12/04/2019 05:56:11 - INFO - transformers.file_utils - https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-pytorch_model.bin not found in cache or force_download set to True, downloading to /tmp/tmpqdrrbc4v 100% 1344997306/1344997306 [01:36<00:00, 13866079.69B/s] 12/04/2019 05:57:49 - INFO - transformers.file_utils - copying /tmp/tmpqdrrbc4v to cache at /root/.cache/torch/transformers/54da47087cc86ce75324e4dc9bbb5f66c6e83a7c6bd23baea8b489acc8d09aa4.4d5343a4b979c4beeaadef17a0453d1bb183dd9b084f58b84c7cc781df343ae6 12/04/2019 05:57:54 - INFO - transformers.file_utils - creating metadata file for /root/.cache/torch/transformers/54da47087cc86ce75324e4dc9bbb5f66c6e83a7c6bd23baea8b489acc8d09aa4.4d5343a4b979c4beeaadef17a0453d1bb183dd9b084f58b84c7cc781df343ae6 12/04/2019 05:57:54 - INFO - transformers.file_utils - removing temp file /tmp/tmpqdrrbc4v 12/04/2019 05:57:54 - INFO - transformers.modeling_utils - loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-pytorch_model.bin from cache at /root/.cache/torch/transformers/54da47087cc86ce75324e4dc9bbb5f66c6e83a7c6bd23baea8b489acc8d09aa4.4d5343a4b979c4beeaadef17a0453d1bb183dd9b084f58b84c7cc781df343ae6 12/04/2019 05:58:05 - INFO - transformers.modeling_utils - Weights of BertForQuestionAnswering not initialized from pretrained model: ['qa_outputs.weight', 'qa_outputs.bias'] 12/04/2019 05:58:05 - INFO - transformers.modeling_utils - Weights from pretrained model not used in BertForQuestionAnswering: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias'] 12/04/2019 05:58:10 - INFO - main - Training/evaluation parameters Namespace(adam_epsilon=1e-08, cache_dir='', config_name='', device=device(type='cuda'), do_eval=True, do_lower_case=True, do_train=True, doc_stride=128, eval_all_checkpoints=False, evaluate_during_training=False, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, learning_rate=3e-05, local_rank=-1, logging_steps=50, max_answer_length=30, max_grad_norm=1.0, max_query_length=64, max_seq_length=256, max_steps=-1, model_name_or_path='bert-large-uncased', model_type='bert', n_best_size=20, n_gpu=1, no_cuda=False, null_score_diff_threshold=0.0, num_train_epochs=4.0, output_dir='model_check_points_r1/', overwrite_cache=False, overwrite_output_dir=False, per_gpu_eval_batch_size=8, per_gpu_train_batch_size=2, predict_file='dev_10.json', save_steps=1000, seed=42, server_ip='', server_port='', tokenizer_name='', train_file='2-dec_train.json', verbose_logging=False, version_2_with_negative=True, warmup_steps=0, weight_decay=0.0) 12/04/2019 05:58:10 - INFO - main - Loading features from cached file cached_train_bert-large-uncased_256 12/04/2019 05:58:11 - INFO - main - Running training 12/04/2019 05:58:11 - INFO - main - Num examples = 26 12/04/2019 05:58:11 - INFO - main - Num Epochs = 4 12/04/2019 05:58:11 - INFO - main - Instantaneous batch size per GPU = 2 12/04/2019 05:58:11 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 2 12/04/2019 05:58:11 - INFO - main - Gradient Accumulation steps = 1 12/04/2019 05:58:11 - INFO - main - Total optimization steps = 52

**it is with max_seq_length 384 /

what is Total optimization steps =52 and what is Num examples = 26**

how its varying ,what exactly these two are

rhl2k commented 4 years ago

python run_squad.py \ --model_type bert \ --model_name_or_path bert-large-uncased \ --do_train \ --do_eval \ --version_2_with_negative \ --do_lower_case \ --train_file 2-dec_train.json \ --predict_file dev_10.json \ --per_gpu_train_batch_size 2 \ --learning_rate 3e-5 \ --num_train_epochs 1.0 \ --max_seq_length 384 \ --doc_stride 128 \ --save_steps 20000 \ --output_dir model_check_points11_a

after running above code m getting this 12/04/2019 11:00:22 - INFO - main - Loading features from cached file cached_train_bert-large-uncased_384 12/04/2019 11:00:44 - INFO - main - Running training 12/04/2019 11:00:44 - INFO - main - Num examples = 88641 12/04/2019 11:00:44 - INFO - main - Num Epochs = 1 12/04/2019 11:00:44 - INFO - main - Instantaneous batch size per GPU = 2 12/04/2019 11:00:44 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 2 12/04/2019 11:00:44 - INFO - main - Gradient Accumulation steps = 1 12/04/2019 11:00:44 - INFO - main - Total optimization steps = 44321 plz any one have any idea what is total optimization steps????and also Num examples