amazon-science / tanl

Structured Prediction as Translation between Augmented Natural Languages
Apache License 2.0
130 stars 25 forks source link

About performance on tacred #1

Closed MatthewCYM closed 3 years ago

MatthewCYM commented 3 years ago

Hi,

Thanks for sharing the code. I try to reproduce the result on tacred. However, the F1 score on the test set is only 67.67.

The config I used is listed below.

[tacred] datasets = tacred multitask = False model_name_or_path = t5-base num_train_epochs = 10 max_seq_length = 256 train_split = train per_device_train_batch_size = 16 do_train = True do_eval = True do_predict = True

I run the code with

CUDA_VISIBLE_DEVICES=0,1 nohup python3 -m torch.distributed.launch --nproc_per_node=2 run.py tacred > result.log 2>&1 &

May I ask which part goes wrong? Thank you.

Regards, Yiming

giove91 commented 3 years ago

Hi, thank you for your interest in our work! Here is the config we used to obtain the TACRED results in the paper:

[tacred_final]
datasets = tacred
model_name_or_path = t5-base
num_train_epochs = 5
max_seq_length = 300
max_output_seq_length = 64
max_output_seq_length_eval = 128
train_split = train,dev
per_device_train_batch_size = 20
per_device_eval_batch_size = 20
do_train = True
do_eval = False
do_predict = True
episodes = 1-5

The most important difference is the use of the dev set for training.

MatthewCYM commented 3 years ago

Hi,

Thank you for the reply. May I ask how many GPUs did you use in the experiments? Is it 8 GPUs as mentioned in the paper?

Regards, Yiming

giove91 commented 3 years ago

That is correct, we used 8 GPUs.