Closed MatthewCYM closed 3 years ago
Hi, thank you for your interest in our work! Here is the config we used to obtain the TACRED results in the paper:
[tacred_final]
datasets = tacred
model_name_or_path = t5-base
num_train_epochs = 5
max_seq_length = 300
max_output_seq_length = 64
max_output_seq_length_eval = 128
train_split = train,dev
per_device_train_batch_size = 20
per_device_eval_batch_size = 20
do_train = True
do_eval = False
do_predict = True
episodes = 1-5
The most important difference is the use of the dev set for training.
Hi,
Thank you for the reply. May I ask how many GPUs did you use in the experiments? Is it 8 GPUs as mentioned in the paper?
Regards, Yiming
That is correct, we used 8 GPUs.
Hi,
Thanks for sharing the code. I try to reproduce the result on tacred. However, the F1 score on the test set is only 67.67.
The config I used is listed below.
[tacred] datasets = tacred multitask = False model_name_or_path = t5-base num_train_epochs = 10 max_seq_length = 256 train_split = train per_device_train_batch_size = 16 do_train = True do_eval = True do_predict = True
I run the code with
CUDA_VISIBLE_DEVICES=0,1 nohup python3 -m torch.distributed.launch --nproc_per_node=2 run.py tacred > result.log 2>&1 &
May I ask which part goes wrong? Thank you.
Regards, Yiming