google-research / text-to-text-transfer-transformer

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
https://arxiv.org/abs/1910.10683
Apache License 2.0
6.17k stars 756 forks source link

Support for pre-training mT5 #660

Closed sumanthd17 closed 3 years ago

sumanthd17 commented 3 years ago

I used the instructions mentioned here for pre-training mT5.

But it is throwing this error

--eval_gin_param=mesh_eval_dataset_fn.num_eval_examples = 10000: command not found

upon comparing the mT5 code with the t5 repository, I observed there have been a lot of changes in t5, since the release of mT5.

Can you please update the README.md or direct me to the resource for supporting the pre-training task with mT5.

Thanks in Advance

craffel commented 3 years ago

You are missing a backslash \ in the line before --eval_gin_param=mesh_eval_dataset_fn.num_eval_examples = 10000 in your command, so your shell thinks you are trying to execute a separate command.

sumanthd17 commented 3 years ago

This is the invocation I used

python -m t5.models.mesh_transformer_main \
  --tpu="${TPU}" \
  --gcp_project="${PROJECT}" \
  --tpu_zone="${ZONE}" \
  --model_dir="${MODEL_DIR}" \
  --gin_file="models/t5.1.1.base.gin" \
  --gin_param="MIXTURE_NAME = '${TASK}'" \
  --gin_param="utils.run.sequence_length = {'inputs': 1024, 'targets': 256}" \
  --gin_param="utils.run.batch_size = ('tokens_per_batch', 1048576)" \
  --gin_param="utils.run.learning_rate_schedule=@learning_rate_schedules.rsqrt_no_ramp_down" \
  --gin_param="run.train_steps = 1000000" \
  --gin_param="utils.tpu_mesh_shape.model_parallelism = 1" \
  --gin_param="utils.tpu_mesh_shape.tpu_topology = 'v3-8'" \
  --eval_mode="perplexity_eval" \
  --eval_gin_param="mesh_eval_dataset_fn.num_eval_examples = 10000" \
  --t5_tfds_data_dir="${BUCKET}/t5-tfds" \
  --module_import="multilingual_t5.tasks"
sumanthd17 commented 3 years ago

@craffel @adarob can you share the requirements file with their versions that are required to run mT5. There are a lot of library mis-matches and throwing errors

Thanks in Advance