Finetuning script/config for T5 model

Hi @cherry979988,
Thanks for sharing the implementation for your benchmark. I was able to run the BART direct-finetuning on GLUE-SST2 and get 0.86 accuracy.
I switched the model to T5. I follow the model definitions from nanoT5, but I am not being able to finetune T5 (high losses, zero accuracy). I was wondering is there any BART specific pre-processing which I need to modify to be able to work with T5? Any help would be greatly appreciated. If you can share a corresponding T5 finetuning script, that would be great.
10/30/2023 20:51:31 - INFO - __main__ - Namespace(train_file='data/glue-sst2/glue-sst2_16_100_train.tsv', dev_file='data/glue-sst2/glue-sst2_16_100_dev.tsv', test_file='data/glue-sst2/glue-sst2_1=1000, wait_step=10000000000, quiet=False, eval_period=100, prefix='', debug=False, seed=42)
10/30/2023 20:51:31 - INFO - __main__ - models/google/t5-v1_1-small/singletask-glue-sst2-no-hp
10/30/2023 20:51:31 - INFO - __main__ - Using 1 gpus
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.86k/1.86k [00:00<00:00, 1.30MB/s]
Downloading (…)ve/main/spiece.model: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 9.57MB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.79k/1.79k [00:00<00:00, 2.67MB/s]
Downloading (…)lve/main/config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 537/537 [00:00<00:00, 765kB/s]d as explained in https://github.com/huggingface/transformers/pull/24565
10/30/2023 20:51:31 - INFO - __main__ - Loading pre-tokenized data from data/glue-sst2/glue-sst2_16_100_train-T5Tokenized.json
10/30/2023 20:51:31 - INFO - __main__ - Loaded 32 examples from train data
10/30/2023 20:51:31 - INFO - __main__ - Loading pre-tokenized data from data/glue-sst2/glue-sst2_16_100_dev-T5Tokenized.json
10/30/2023 20:51:31 - INFO - __main__ - Loaded 32 examples from dev data
Downloading pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 308M/308M [00:01<00:00, 291MB/s]
Downloading (…)neration_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 147/147 [00:00<00:00, 110kB/s]
  warnings.warn(
10/30/2023 20:51:37 - INFO - __main__ - Starting training!
Epoch 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 18.43it/s]
Epoch 1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.57it/s]
Epoch 2: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.47it/s]
Epoch 3: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.45it/s]
Epoch 6:  19%|████████████████████████████▏                                                                                                                         | 3/16 [00:00<00:00, 28.07it/s]
10/30/2023 20:51:43 - INFO - __main__ - Step 100 Train loss 9728.05 ACC 0.0 on epoch=6
10/30/2023 20:51:43 - INFO - __main__ - Not saving model with best ACC: -1.0 -> 0.0 on epoch=6, global_step=100
Epoch 6: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:03<00:00,  4.98it/s]
Epoch 7: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.46it/s]
Epoch 8: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.35it/s]
Epoch 11: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.43it/s]
Epoch 12:  38%|███████████████████████████████████████████████████████▉                                                                                             | 6/16 [00:00<00:00, 28.35it/s]
10/30/2023 20:51:49 - INFO - __main__ - Step 200 Train loss 9156.54 ACC 0.0 on epoch=12
Epoch 12: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:02<00:00,  5.57it/s]
Epoch 13: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.35it/s]
Epoch 14: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.33it/s]
Epoch 15: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.21it/s]
Epoch 18:  56%|███████████████████████████████████████████████████████████████████████████████████▊                                                                 | 9/16 [00:00<00:00, 28.19it/s]
10/30/2023 20:51:55 - INFO - __main__ - Step 300 Train loss 8761.43 ACC 0.0 on epoch=18
Epoch 18: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:02<00:00,  5.55it/s]
Epoch 19: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.30it/s]
Epoch 20: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.19it/s]
Epoch 21: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.14it/s]
Epoch 24:  94%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊         | 15/16 [00:00<00:00, 28.36it/s]
10/30/2023 20:52:01 - INFO - __main__ - Step 400 Train loss 8735.57 ACC 0.0 on epoch=24
Epoch 24: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:02<00:00,  5.55it/s]
Epoch 25: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.17it/s]
Epoch 26: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.37it/s]
Epoch 27: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.30it/s]
Epoch 30: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.32it/s]
Epoch 31:  19%|███████████████████████████▉                                                                                                                         | 3/16 [00:00<00:00, 28.47it/s]
10/30/2023 20:52:07 - INFO - __main__ - Step 500 Train loss 8380.33 ACC 0.0 on epoch=31
Epoch 31: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:02<00:00,  5.49it/s]
Epoch 32: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.30it/s]
Epoch 33: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.32it/s]
Epoch 36: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.31it/s]
Epoch 37:  38%|███████████████████████████████████████████████████████▉                                                                                             | 6/16 [00:00<00:00, 28.32it/s]
10/30/2023 20:52:13 - INFO - __main__ - Step 600 Train loss 8396.24 ACC 0.0 on epoch=37
Epoch 37: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:02<00:00,  5.52it/s]
Epoch 38: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.19it/s]
Epoch 39: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.36it/s]
Epoch 40: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.23it/s]
Epoch 43:  56%|███████████████████████████████████████████████████████████████████████████████████▊                                                                 | 9/16 [00:00<00:00, 28.30it/s]
10/30/2023 20:52:18 - INFO - __main__ - Step 700 Train loss 7912.00 ACC 0.0 on epoch=43
Epoch 43: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:02<00:00,  5.52it/s]
Epoch 44: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.18it/s]
Epoch 45: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.36it/s]
Epoch 46: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.37it/s]
Epoch 49:  94%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊         | 15/16 [00:00<00:00, 28.36it/s]
10/30/2023 20:52:24 - INFO - __main__ - Step 800 Train loss 7964.75 ACC 0.0 on epoch=49
Epoch 49: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:02<00:00,  5.52it/s]
Epoch 50: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.12it/s]
Epoch 49:  94%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊         | 15/16 [00:00<00:00, 28.36it/s]10/30/2023 20:52:24 - INFO - __main__ - Step 800 Train loss 7964.75 ACC 0.0 on epoch=49
Epoch 49: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:02<00:00,  5.52it/s]
Epoch 50: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.12it/s]
Epoch 51: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.20it/s]
Epoch 52: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.30it/s]
Epoch 53: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.18it/s]
Epoch 54: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.24it/s]
Epoch 55: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.29it/s]
Epoch 56:  19%|███████████████████████████▉                                                                                                                         | 3/16 [00:00<00:00, 28.39it/s]10/30/2023 20:52:30 - INFO - __main__ - Step 900 Train loss 8235.67 ACC 0.0 on epoch=56
Epoch 56: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:02<00:00,  5.52it/s]
Epoch 57: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.31it/s]
Epoch 58: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.35it/s]
Epoch 59: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.38it/s]
Epoch 60: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.24it/s]
Epoch 61: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 28.25it/s]
Epoch 62:  38%|███████████████████████████████████████████████████████▉                                                                                             | 6/16 [00:00<00:00, 28.32it/s]10/30/2023 20:52:36 - INFO - __main__ - Step 1000 Train loss 8064.02 ACC 0.0 on epoch=62
Epoch 62:  44%|█████████████████████████████████████████████████████████████████▏                                                                                   | 7/16 [00:02<00:03,  2.67it/s]
10/30/2023 20:52:37 - INFO - __main__ - Loading checkpoint from CPU
10/30/2023 20:52:37 - INFO - __main__ - Loading pre-tokenized data from data/glue-sst2/glue-sst2_16_100_test-T5Tokenized.json
10/30/2023 20:52:37 - INFO - __main__ - Loaded 872 examples from test data
10/30/2023 20:53:44 - INFO - __main__ - Saved prediction in models/google/t5-v1_1-small/singletask-glue-sst2-no-hp/_predictions.txt
10/30/2023 20:53:44 - INFO - __main__ - ACC on test data: 0.00
INK-USC / CrossFit

Finetuning script/config for T5 model #9