Finetuning the v2 version of the base model

Hi,

I am trying to finetune the v2 version of the base model, downloaded from here&prefix=&forceOnObjectsSortingFiltering=false).

This is the command I am using,

t5_mesh_transformer  \
  --model_dir="/tmp/model_out" \
  --gin_param="utils.run.mesh_devices = ['gpu:0','gpu:1']" \
  --gin_param="utils.run.train_dataset_fn = @t5.models.mesh_transformer.tsv_dataset_fn" \
  --gin_param="utils.run.mesh_shape = 'model:1,batch:2'" \
  --gin_param="tsv_dataset_fn.filename = 'train.tsv'" \
  --gin_file="operative_config.gin" \
  --gin_param="run.train_steps = 1260900"

I am using the the initial checkpoint as (in the operative_config.gin) file ,

init_checkpoint =/path/to/downloaded/v2/base/model.ckpt-1250900'`

and I want to train for 10000 epochs, hence --gin_param="run.train_steps = 1260900"

Given the task, is this the right setup?

I am seeing poor performance, max 8% accuracy after the training step.

I am attaching the train.tsv and the operative_config.gin files.

Archive.zip

Any help is appreciated. Thanks

allenai / unifiedqa

Finetuning the v2 version of the base model #47