about orqa model running

paulrich1234 commented 4 years ago

hi i have changed the following command to running on GPU device

Training on TPU MODEL_DIR=gs:/// TFHUB_CACHE_DIR=gs:/// TFHUB_CACHE_DIR=$TFHUB_CACHE_DIR \ TPU_NAME= python -m language.orqa.experiments.ict_experiment \ --model_dir=$MODEL_DIR \ --bert_hub_module_path=https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1 \ --examples_path=gs://orqa-data/enwiki-20181220/examples.tfr \ --save_checkpoints_steps=1000 \ --batch_size=4096 \ --num_train_steps=100000 \ --tpu_name=$TPU_NAME \ --use_tpu=True

to like this : python -m language.orqa.experiments.ict_experiment \ --model_dir=$MODEL_DIR \ --bert_hub_module_path=https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1 \ --examples_path=gs://orqa-data/enwiki-20181220/examples.tfr \ --save_checkpoints_steps=1000 \ --batch_size=4096 \ --num_train_steps=100000 \ --use_tpu=False

batchsize is about 256
but it seems too slow 100 steps needs 8409.323 sec i have 4 tesla v100 GPUS i donot know whether config right for this running .

Thank you

kentonl commented 4 years ago

We haven't test this code on a multi-GPU setup. Are you sure it's using all available GPUs?

paulrich1234 commented 4 years ago

We haven't test this code on a multi-GPU setup. Are you sure it's using all available GPUs?

hi kentonl :
i have checked it ,it was running on cpus ,cause i havent install cudnn ,but i have test on GPUs ,it was just running on one GPU (i have 4 GPUs on one machine).i donot know how to config that to use all GPUS
Thank you

google-research / language

about orqa model running #81