How to choose num_train_step in run_pretraining.py ?

python create_pretraining_data.py \
  --input_file=./chinese_sample_text.txt \
  --output_file=/tmp/tf_examples.tfrecord \
  --vocab_file=bert_checkpoint/vocab.txt \
  --do_lower_case=True \
  --max_seq_length=256 \
  --max_predictions_per_seq=38 \
  --masked_lm_prob=0.15 \
  --random_seed=12345 \
  --dupe_factor=5 \
  --do_whole_word_mask

python run_pretraining.py \
  --input_file=/tmp/tf_examples.tfrecord \
  --output_dir=/tmp/pretraining_output \
  --do_train=True \
  --do_eval=True \
  --bert_config_file=bert_checkpoint/bert_config.json \
  --init_checkpoint=bert_checkpoint/bert_model.ckpt \
  --train_batch_size=32 \
  --max_seq_length=256 \
  --max_predictions_per_seq=38 \
  --num_train_steps=20 \
  --num_warmup_steps=10 \
  --learning_rate=2e-5

Let's assume the create_pretraining_data.py script wrote N total instances, such as N=100000; How to choose num_train_step in run_pretraining with the N total instances we have?

google-research / bert

How to choose num_train_step in run_pretraining.py ? #1081