LUMIA-Group / rasat

The official implementation of the paper "RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL"(EMNLP 2022)
https://arxiv.org/abs/2205.06983
Apache License 2.0
63 stars 18 forks source link

Could you share the training config of sparc_add_coref_t5_3b_order_0514_ckpt-4224? #3

Closed Dongfeng-He closed 2 years ago

Dongfeng-He commented 2 years ago

Hi, @JiexingQi I'm trying to reproduce RASAT-T5-3b on SParC. I have trained the model for 4736 steps and only got the best QEM of 63.7%, IEM of 45.0% at step 3776. There is still a gap of 1.3% of QEM compared to sparc_add_coref_t5_3b_order_0514_ckpt-4224 without PICARD. Is there anything wrong with my training config? Could you share the training config of sparc_add_coref_t5_3b_order_0514_ckpt-4224?

My training config is modified based on the given train_sparc_rasat_small.json. I have 8 gpus, so I modify per_device_train_batch_size and gradient_accumulation_steps to achieve the recommended total_batch_size of 2048.

model_name_or_path: t5-small -> t5-3b dataset: sparc+spider -> sparc per_device_train_batch_size: 16 -> 2 per_device_eval_batch_size: 16 -> 2 gradient_accumulation_steps: 32 -> 128 use_coref: false -> true use_dependency: true -> false

My full training config:

{
    "run_name": "train_sparc_rasat_3b",
    "model_name_or_path": "t5-3b",
    "use_rasat": true,
    "dataset": "sparc",
    "wandb_project_name": "rasat_experiment",
    "source_prefix": "",
    "schema_serialization_type": "custom",
    "schema_serialization_randomized": false,
    "schema_serialization_with_db_id": true,
    "schema_serialization_with_db_content": false,
    "normalize_query": true,
    "target_with_db_id": true,
    "output_dir": "./experiment/train_sparc_rasat_3b",
    "cache_dir": "./transformers_cache",
    "do_train": true,
    "do_eval": true,
    "fp16": false,
    "num_train_epochs": 3072,
    "per_device_train_batch_size": 2,
    "per_device_eval_batch_size": 2,
    "gradient_accumulation_steps": 128,
    "label_smoothing_factor": 0.0,
    "learning_rate": 1e-4,
    "adafactor": true,
    "adam_eps": 1e-6,
    "lr_scheduler_type": "constant",
    "warmup_ratio": 0.0,
    "warmup_steps": 0,
    "weight_decay": 0,
    "seed": 1,
    "report_to": ["wandb"],
    "logging_strategy": "steps",
    "logging_first_step": true,
    "logging_steps": 3,
    "load_best_model_at_end": true,
    "metric_for_best_model": "exact_match",
    "greater_is_better": true,
    "save_total_limit": 2,
    "save_steps": 64,
    "evaluation_strategy": "steps",
    "eval_steps": 64,
    "predict_with_generate": true,
    "num_beams": 4,
    "num_beam_groups": 1,
    "edge_type": "Default",
    "use_coref": true, 
    "use_dependency": false,
    "use_picard": false,
    "overwrite_output_dir": true,
    "dataloader_num_workers": 8,
    "group_by_length": true,
    "gradient_checkpointing":true
}
JiexingQi commented 2 years ago

Hi, @Dongfeng-He ,

  1. you may keep
    dataset: sparc+spider

since the original T5-3B +PICARD use

dataset: cosql+spider

when training on CoSQL dataset.

  1. you should set

    "schema_serialization_with_db_content": true

    to make full use of the database content

  2. Also, the config are as follows:

    
    {
    "run_name": "train_0514_sparc_order_relation_add_coref",
    "model_name_or_path": "t5-3b",
    "dataset": "sparc+spider",
    "source_prefix": "",
    "schema_serialization_type": "custom",
    "schema_serialization_randomized": false,
    "schema_serialization_with_db_id": true,
    "schema_serialization_with_db_content": true,
    "normalize_query": true,
    "target_with_db_id": true,
    "output_dir": "./experiment/train_0514_sparc_order_relation_add_coref",
    "cache_dir": "./transformers_cache",
    "do_train": true,
    "do_eval": true,
    "fp16": false,
    "num_train_epochs": 3072,
    "per_device_train_batch_size": 16,
    "per_device_eval_batch_size": 16,
    "gradient_accumulation_steps": 16,
    "label_smoothing_factor": 0.0,
    "learning_rate": 1e-4,
    "adafactor": true,
    "adam_eps": 1e-6,
    "lr_scheduler_type": "constant",
    "warmup_ratio": 0.0,
    "warmup_steps": 0,
    "weight_decay": 0,
    "seed": 1,
    "report_to": ["wandb"],
    "logging_strategy": "steps",
    "logging_first_step": true,
    "logging_steps": 8,
    "load_best_model_at_end": true,
    "metric_for_best_model": "exact_match",
    "greater_is_better": true,
    "save_total_limit": 128,
    "save_steps": 64,
    "evaluation_strategy": "steps",
    "eval_steps": 64,
    "predict_with_generate": true,
    "num_beams": 4,
    "num_beam_groups": 1,
    "use_picard": false,
    "overwrite_output_dir": true,
    "dataloader_num_workers": 8,
    "group_by_length": true,
    "gradient_checkpointing":true,
    "ddp_find_unused_parameters":false,
    "edge_type": "Default",
    "use_coref": true
    }


![image](https://user-images.githubusercontent.com/43906450/199150087-3ec6141b-8a0e-4159-8e63-20e076dd076a.png)
Dongfeng-He commented 2 years ago

Thanks a lot!

Dongfeng-He commented 2 years ago

Hi, @JiexingQi Is this config using original T5 model? I didn't see "use_rasat": true

JiexingQi commented 2 years ago

@Dongfeng-He Yes, since this is an old config file, I reconstruct some code when I release this repo. You should set

"use_rasat": true

to use rasat