awasthiabhijeet / PIE

Fast + Non-Autoregressive Grammatical Error Correction using BERT. Code and Pre-trained models for paper "Parallel Iterative Edit Models for Local Sequence Transduction": www.aclweb.org/anthology/D19-1435.pdf (EMNLP-IJCNLP 2019)
MIT License
228 stars 40 forks source link

can it use multi gpus to train ? #8

Open wangwang110 opened 4 years ago

Serenade-J commented 4 years ago

Yes.
My method is using tf.contrib.distribute in tensorflow-gpu 1.13 I met some problems with this method, spent several days and finally successfully train PIE on multi GPUs. So you can use other methods if you find them convenient. Below is part of my code in word_edit_model.py, coping them may lead bugs because they are not the whole codes I changed in PIE.

from tensorflow.python.estimator.run_config import RunConfig
from tensorflow.python.estimator.estimator import Estimator
from tensorflow.contrib.distribute import AllReduceCrossDeviceOps
# ...
dist_strategy = tf.contrib.distribute.MirroredStrategy(
      num_gpus=FLAGS.n_gpus,
      cross_device_ops=AllReduceCrossDeviceOps('nccl', num_packs=FLAGS.n_gpus),
      # cross_device_ops=AllReduceCrossDeviceOps('hierarchical_copy'),
  )
  session_config = tf.ConfigProto(
      inter_op_parallelism_threads=0,
      intra_op_parallelism_threads=0,
      allow_soft_placement=True,
      gpu_options=tf.GPUOptions(allow_growth=True))

  run_config = RunConfig(
      train_distribute=dist_strategy,
      eval_distribute=dist_strategy,
      model_dir=FLAGS.output_dir,
      session_config=session_config,
      save_checkpoints_steps=FLAGS.save_checkpoints_steps,
      keep_checkpoint_max=15,
      )
Serenade-J commented 4 years ago

All the documents I referred to (for training PIE with multi GPUs) can be found online.

binhetech commented 4 years ago

All the documents I referred to (for training PIE with multi GPUs) can be found online.

Hi, could you share the whole codes you changed in word_edit_model.py? Thanks.