Closed sumanthd17 closed 3 years ago
You are missing a backslash \
in the line before --eval_gin_param=mesh_eval_dataset_fn.num_eval_examples = 10000
in your command, so your shell thinks you are trying to execute a separate command.
This is the invocation I used
python -m t5.models.mesh_transformer_main \
--tpu="${TPU}" \
--gcp_project="${PROJECT}" \
--tpu_zone="${ZONE}" \
--model_dir="${MODEL_DIR}" \
--gin_file="models/t5.1.1.base.gin" \
--gin_param="MIXTURE_NAME = '${TASK}'" \
--gin_param="utils.run.sequence_length = {'inputs': 1024, 'targets': 256}" \
--gin_param="utils.run.batch_size = ('tokens_per_batch', 1048576)" \
--gin_param="utils.run.learning_rate_schedule=@learning_rate_schedules.rsqrt_no_ramp_down" \
--gin_param="run.train_steps = 1000000" \
--gin_param="utils.tpu_mesh_shape.model_parallelism = 1" \
--gin_param="utils.tpu_mesh_shape.tpu_topology = 'v3-8'" \
--eval_mode="perplexity_eval" \
--eval_gin_param="mesh_eval_dataset_fn.num_eval_examples = 10000" \
--t5_tfds_data_dir="${BUCKET}/t5-tfds" \
--module_import="multilingual_t5.tasks"
@craffel @adarob can you share the requirements file with their versions that are required to run mT5. There are a lot of library mis-matches and throwing errors
Thanks in Advance
I used the instructions mentioned here for pre-training mT5.
But it is throwing this error
--eval_gin_param=mesh_eval_dataset_fn.num_eval_examples = 10000: command not found
upon comparing the mT5 code with the t5 repository, I observed there have been a lot of changes in t5, since the release of mT5.
Can you please update the README.md or direct me to the resource for supporting the pre-training task with mT5.
Thanks in Advance