ShannonAI / glyce

Code for NeurIPS 2019 - Glyce: Glyph-vectors for Chinese Character Representations
https://arxiv.org/abs/1901.10125
Apache License 2.0
419 stars 75 forks source link

复现cws,训练时间太长 #24

Open kFoodie opened 4 years ago

kFoodie commented 4 years ago

您好,我在复现您的cws的时候,发现训练时间好长,一个epoch13个小时都没跑完,想跟您交流一下是啥原因?

我使用的是pku的数据集。

image

python3 run_bert_glyce_tagger.py \ --data_sign pku_cws \ --config_path ../configs/pkucws_glyce_bert.json \ --data_dir data/pku \ --bert_model /data/glusterfs_sharing_04_v2/11117720/bert-chinese-ner-master/checkpoint \ --task_name cws \ --max_seq_length 128 \ --do_train \ --do_eval \ --seed 3310 \ --train_batch_size 64 \ --dev_batch_size 32 \ --test_batch_size 32 \ --learning_rate 3e-5 \ --num_train_epochs 3 \ --checkpoint 10 \ --warmup_proportion -1 \ --gradient_accumulation_steps 2

使用的环境是ubuntu18.0,gpu是tesla t4。

不知道是哪一步出了问题,望回复,感谢!! 使用的是pku的数据集,标注如下: image

修改了数据读取的格式。训练代码去掉了验证部分,其他的基本没做改动。 ` def train(model, optimizer, train_dataloader, test_dataloader, config, \ device, n_gpu, label_list): global_step = 0 nb_tr_steps = 0 tr_loss = 0

dev_best_acc = 0 
dev_best_precision = 0 
dev_best_recall = 0 
dev_best_f1 = 0 
dev_best_loss = 10000000000000

test_best_acc = 0 
test_best_precision = 0 
test_best_recall = 0 
test_best_f1 = 0 
test_best_loss = 1000000000000000

model.train()

for idx in range(int(config.num_train_epochs)):
    tr_loss = 0 
    nb_tr_examples, nb_tr_steps = 0, 0 
    print("#######"*10)
    print("EPOCH: ", str(idx))
    for step, batch in tqdm(enumerate(train_dataloader)):
        batch = tuple(t.to(device) for t in batch)
        input_ids, input_mask, segment_ids, label_ids = batch 
        loss, glyph_loss = model(input_ids, segment_ids, input_mask, label_ids)
        if n_gpu > 1:
            loss = loss.mean()
            glyph_loss = glyph_loss.mean() 

        if global_step < config.glyph_warmup:
            sum_loss = loss + config.glyph_ratio * glyph_loss 
        else:
            sum_loss = loss + config.glyph_ratio * glyph_loss * config.glyph_decay ** (idx + 1 + global_step // 5)   

        if idx >= 1:
            sum_loss = loss 

        sum_loss.backward()

        tr_loss += loss.item()

        nb_tr_examples += input_ids.size(0)
        nb_tr_steps += 1 

        if (step + 1) % config.gradient_accumulation_steps == 0:
            optimizer.step()
            optimizer.zero_grad()
            global_step += 1 

# export a trained mdoel 
tmp_test_loss, tmp_test_acc, tmp_test_prec, tmp_test_rec, tmp_test_f1 = eval_checkpoint(model, test_dataloader, config, device, n_gpu, label_list, eval_sign="test")
print("......"*10)
print("TEST: loss, acc, precision, recall, f1")
print(tmp_test_loss, tmp_test_acc, tmp_test_prec, tmp_test_rec, tmp_test_f1)
model_to_save = model 
output_model_file = os.path.join(config.output_dir, "bert_model.bin")
if config.export_model == "True":
    torch.save(model_to_save.state_dict(), output_model_file)

print("=&="*15)
print("DEV: current best precision, recall, f1, acc, loss ")
print(dev_best_precision, dev_best_recall, dev_best_f1, dev_best_acc, dev_best_loss)
print("TEST: current best precision, recall, f1, acc, loss ")
print(test_best_precision, test_best_recall, test_best_f1, test_best_acc, test_best_loss)
print("=&="*15)

`

fengliangjing commented 4 years ago

你好~请问您解决这个问题了吗?我也遇到了同样的问题 ><

ghost commented 4 years ago

您好,谢谢您的提问。我大约29号给您回复可以么。我安排时间复现一下。 谢谢!

kFoodie commented 4 years ago

您好,谢谢您的提问。我大约29号给您回复可以么。我安排时间复现一下。 谢谢!

感谢回复,麻烦您了。

kFoodie commented 4 years ago

你好~请问您解决这个问题了吗?我也遇到了同样的问题 ><

并没有。。

kFoodie commented 4 years ago

image 我跑了ctb6的,大概跑了一天。结果是这样,跟你们的论文的结果差距好大。 代码几乎都没改你们。不知道是哪一步出了问题? 实验的参数设置如下:

python3 run_bert_glyce_tagger.py \ --data_sign pku_cws \ --config_path ../configs/pkucws_glyce_bert.json \ --data_dir data/ctb6 \ --bert_model /data/glusterfs_sharing_04_v3/11117720/bert-chinese-ner-master/checkpoint \ --output_dir /data/glusterfs_sharing_04_v3/11117720/glyce-master/output/ \ --task_name cws \ --max_seq_length 128 \ --do_train \ --do_eval \ --seed 3310 \ --train_batch_size 64 \ --dev_batch_size 32 \ --test_batch_size 32 \ --learning_rate 3e-5 \ --num_train_epochs 3 \ --warmup_proportion -1 \ --gradient_accumulation_steps 100

kFoodie commented 4 years ago

我试了其他数据集,基本上也都上不到90%的F1值……