Closed ccchobits closed 3 years ago
@ccchobits : were you able to reproduce results?
@ccchobits : quick follow-up
@ccchobits : were you able to reproduce results?
Yes, the expected performance on WN18RR can reproduced using hyper-parameters in the scripts, except dropout. The dropout should be 0.3 rather than 0.1
Hi Team,
I am learning CoKE released from your team recently. I followed the steps and run the model on WN18RR with specified hyper-parameteres for multiple times, but the model trained locally cannot reproduce the performance denoted in the paper. Here are the hyper-parameters for training (actually they are default settings specified in "wn18rr_job_config.sh"):
do_train: True ema_decay: 0.9999 epoch: 1000 hidden_act: gelu hidden_dropout_prob: 0.1 hidden_size: 256 in_tokens: False init_checkpoint: None init_pretraining_params: None initializer_range: 0.02 intermediate_size: 512 learning_rate: 0.0005 loss_scaling: 1.0 lr_scheduler: linear_warmup_decay max_position_embeddings: 40 max_seq_len: 3 num_attention_heads: 4 num_hidden_layers: 12 num_iteration_per_drop_scope: 1 num_relations: 11 predict_file: None sen_candli_file: None sen_trivial_file: None skip_steps: 1000 soft_label: 0.15 train_file: ./data/wn18rr/train.coke.txt true_triple_path: ./data/wn18rr/all.txt use_cuda: True use_ema: False use_fast_executor: False use_fp16: False verbose: False vocab_path: ./data/wn18rr/vocab.txt vocab_size: 41054 warmup_proportion: 0.1 weight_decay: 0.01 weight_sharing: True
The performance are: TASK MRR Hits@1 Hits@3 Hits@10 wn18rr 0.462 0.427 0.475 0.533 (local) instead of wn18rr 0.484 0.450 0.496 0.553 (paper)
I wonder whether this repository is the up-to-date version, or the best performance use another set of hyper-parameters? Could you please provide some hints for this issue. Thank you.
Additionally, there is a pretrained model which can be downloaded using "wget_kbc_models.sh" (step 4), but it seems the link is unavailable now. Hopefully this can be fixed.
Best regards