allenai / unified-io-2

Apache License 2.0
572 stars 27 forks source link

refexp.gin for finetune #7

Closed siruizhang30 closed 10 months ago

siruizhang30 commented 10 months ago

Great work,

python3 t5x/train.py --gin_file=t5x/examples/unified_io/t5_1_1/large.gin --gin_file=t5x/examples/unified_io/t5_1_1/finetune/refexp.gin --gin.INITIAL_CHECKPOINT_PATH=\"/path/to/checkpoint\" --gin.MODEL_DIR=\"path/to/output_dir\" --gin.BATCH_SIZE=8

In the readme training section you mentioned that refexp.gin file is used for finetune configuration., But in refexp.gin, set infer_eval_dataset_cfg, looks like for inference, not train or finetune.

train_script.train:
  eval_period = 2500
  stats_period = 500
  partitioner = @partitioning.PjitPartitioner()
  use_wandb = True
  concurrent_metrics = False
  infer_eval_dataset_cfg = @train_infer/t5x_utils.DatasetConfig()

Could you please check it out, thank you very much. And could you provide one .gin file for training, thanks a lot.

chrisc36 commented 10 months ago

To clarify, the infer_eval_dataset_cfg parameter in that file sets up the evaluation that is done every several thousand steps during fine-tuning. So it is for inference, but as part of the training process.

If you want a single gin file, you can just concatenate large.gin and refexp.gin, but we have generally kept them separate so the launch script is more modular.