Could you share the training config of sparc_add_coref_t5_3b_order_0514_ckpt-4224?

Dongfeng-He commented 2 years ago

Hi, @JiexingQi I'm trying to reproduce RASAT-T5-3b on SParC. I have trained the model for 4736 steps and only got the best QEM of 63.7%, IEM of 45.0% at step 3776. There is still a gap of 1.3% of QEM compared to sparc_add_coref_t5_3b_order_0514_ckpt-4224 without PICARD. Is there anything wrong with my training config? Could you share the training config of sparc_add_coref_t5_3b_order_0514_ckpt-4224?

My training config is modified based on the given train_sparc_rasat_small.json. I have 8 gpus, so I modify per_device_train_batch_size and gradient_accumulation_steps to achieve the recommended total_batch_size of 2048.

model_name_or_path: t5-small -> t5-3b dataset: sparc+spider -> sparc per_device_train_batch_size: 16 -> 2 per_device_eval_batch_size: 16 -> 2 gradient_accumulation_steps: 32 -> 128 use_coref: false -> true use_dependency: true -> false

My full training config:

{
    "run_name": "train_sparc_rasat_3b",
    "model_name_or_path": "t5-3b",
    "use_rasat": true,
    "dataset": "sparc",
    "wandb_project_name": "rasat_experiment",
    "source_prefix": "",
    "schema_serialization_type": "custom",
    "schema_serialization_randomized": false,
    "schema_serialization_with_db_id": true,
    "schema_serialization_with_db_content": false,
    "normalize_query": true,
    "target_with_db_id": true,
    "output_dir": "./experiment/train_sparc_rasat_3b",
    "cache_dir": "./transformers_cache",
    "do_train": true,
    "do_eval": true,
    "fp16": false,
    "num_train_epochs": 3072,
    "per_device_train_batch_size": 2,
    "per_device_eval_batch_size": 2,
    "gradient_accumulation_steps": 128,
    "label_smoothing_factor": 0.0,
    "learning_rate": 1e-4,
    "adafactor": true,
    "adam_eps": 1e-6,
    "lr_scheduler_type": "constant",
    "warmup_ratio": 0.0,
    "warmup_steps": 0,
    "weight_decay": 0,
    "seed": 1,
    "report_to": ["wandb"],
    "logging_strategy": "steps",
    "logging_first_step": true,
    "logging_steps": 3,
    "load_best_model_at_end": true,
    "metric_for_best_model": "exact_match",
    "greater_is_better": true,
    "save_total_limit": 2,
    "save_steps": 64,
    "evaluation_strategy": "steps",
    "eval_steps": 64,
    "predict_with_generate": true,
    "num_beams": 4,
    "num_beam_groups": 1,
    "edge_type": "Default",
    "use_coref": true, 
    "use_dependency": false,
    "use_picard": false,
    "overwrite_output_dir": true,
    "dataloader_num_workers": 8,
    "group_by_length": true,
    "gradient_checkpointing":true
}

JiexingQi commented 2 years ago

Hi, @Dongfeng-He ,

you may keep
```
dataset: sparc+spider
```

since the original T5-3B +PICARD use

dataset: cosql+spider

when training on CoSQL dataset.

you should set
```
"schema_serialization_with_db_content": true
```
to make full use of the database content

Also, the config are as follows:


{
"run_name": "train_0514_sparc_order_relation_add_coref",
"model_name_or_path": "t5-3b",
"dataset": "sparc+spider",
"source_prefix": "",
"schema_serialization_type": "custom",
"schema_serialization_randomized": false,
"schema_serialization_with_db_id": true,
"schema_serialization_with_db_content": true,
"normalize_query": true,
"target_with_db_id": true,
"output_dir": "./experiment/train_0514_sparc_order_relation_add_coref",
"cache_dir": "./transformers_cache",
"do_train": true,
"do_eval": true,
"fp16": false,
"num_train_epochs": 3072,
"per_device_train_batch_size": 16,
"per_device_eval_batch_size": 16,
"gradient_accumulation_steps": 16,
"label_smoothing_factor": 0.0,
"learning_rate": 1e-4,
"adafactor": true,
"adam_eps": 1e-6,
"lr_scheduler_type": "constant",
"warmup_ratio": 0.0,
"warmup_steps": 0,
"weight_decay": 0,
"seed": 1,
"report_to": ["wandb"],
"logging_strategy": "steps",
"logging_first_step": true,
"logging_steps": 8,
"load_best_model_at_end": true,
"metric_for_best_model": "exact_match",
"greater_is_better": true,
"save_total_limit": 128,
"save_steps": 64,
"evaluation_strategy": "steps",
"eval_steps": 64,
"predict_with_generate": true,
"num_beams": 4,
"num_beam_groups": 1,
"use_picard": false,
"overwrite_output_dir": true,
"dataloader_num_workers": 8,
"group_by_length": true,
"gradient_checkpointing":true,
"ddp_find_unused_parameters":false,
"edge_type": "Default",
"use_coref": true
}



![image](https://user-images.githubusercontent.com/43906450/199150087-3ec6141b-8a0e-4159-8e63-20e076dd076a.png)

Dongfeng-He commented 2 years ago

Thanks a lot!

Dongfeng-He commented 2 years ago

Hi, @JiexingQi Is this config using original T5 model? I didn't see "use_rasat": true

JiexingQi commented 2 years ago

@Dongfeng-He Yes, since this is an old config file, I reconstruct some code when I release this repo. You should set

"use_rasat": true

to use rasat

LUMIA-Group / rasat

Could you share the training config of sparc_add_coref_t5_3b_order_0514_ckpt-4224? #3