RowitZou / CG-nAR

EMNLP-2021 paper: Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems.
MIT License
18 stars 1 forks source link

The results of my experiment seem to be lower than the indicators generated in the paper #5

Open unknowed-ER opened 2 years ago

unknowed-ER commented 2 years ago

I run the code for persona dataset by following command. preprocess 0

bash src/persona_preprocess.sh

preprocess 1

python ./src/preprocess.py -dataset persona -mode raw_to_json -raw_path raw_data/persona -save_path json_data/persona/persona -adj_file graph_data/persona/adj_matrix.txt -vertex_file graph_data/persona/vertex.txt -log_file logs/raw_to_json_persona.log
python ./src/preprocess.py -dataset persona -mode json_to_data -type train -raw_path json_data/persona -save_path torch_data/persona -tokenizer bert-base-uncased -adj_file graph_data/persona/adj_matrix.txt -vertex_file graph_data/persona/vertex.txt -log_file logs/json_to_data_persona.log
python ./src/preprocess.py -dataset persona -mode json_to_data -type dev -raw_path json_data/persona -save_path torch_data/persona -tokenizer bert-base-uncased -adj_file graph_data/persona/adj_matrix.txt -vertex_file graph_data/persona/vertex.txt -log_file logs/json_to_data_persona.log
python ./src/preprocess.py -dataset persona -mode json_to_data -type test -raw_path json_data/persona -save_path torch_data/persona -tokenizer bert-base-uncased -adj_file graph_data/persona/adj_matrix.txt -vertex_file graph_data/persona/vertex.txt -log_file logs/json_to_data_persona.log

Train

python ./src/main.py -mode train -data_path torch_data/persona/persona -model_path models/persona -log_file logs/persona.train.log -visible_gpus 0 -warmup_steps 8000 -lr 0.001 -train_steps 100000 -graph_emb_path graph_data/persona/graph_embedding.npy -tokenizer bert-base-uncased

test

python ./src/main.py -mode test -data_path torch_data/persona/persona -log_file logs/persona.test.log -alpha 0.95 -test_from models/persona/model_step_100000.pt -result_path results/persona/persona -visible_gpus 0 -test_batch_ex_size 50 -graph_emb_path graph_data/persona/graph_embedding.npy -tokenizer bert-base-uncased

the result is following, it seems to be lower than the indicators generated in the paper. What is the reason for the performance degradation and how can I solve it.

[2022-06-20 19:05:34,342 INFO] Loading checkpoint from models/persona/model_step_100000.pt
[2022-06-20 19:24:06,733 INFO] Loading checkpoint from models/persona/model_step_100000.pt
[2022-06-20 19:24:14,400 INFO] Loading test dataset from torch_data/persona/persona.test.0.pt, number of examples: 2000
[2022-06-20 19:24:48,680 INFO] Loading test dataset from torch_data/persona/persona.test.1.pt, number of examples: 2000
[2022-06-20 19:25:23,221 INFO] Loading test dataset from torch_data/persona/persona.test.2.pt, number of examples: 1317
[2022-06-20 19:25:46,449 INFO] Ext Concept Score at step 100000: 
>> P/R/F1: 47.83/44.20/45.94
[2022-06-20 19:25:46,450 INFO] Gold Length at step 100000: 11.26

[2022-06-20 19:25:46,454 INFO] Prediction Length ratio at step 100000: 0.55
[2022-06-20 19:25:46,454 INFO] Prediction Bleu at step 100000: 3.83
[2022-06-20 19:25:46,454 INFO] Prediction Rouges at step 100000: 
>> ROUGE-F(1/2/l): 23.10/5.98/22.24
ROUGE-R(1/2/l): 18.11/4.81/17.53
ROUGE-P(1/2/l): 35.93/9.22/33.91
[2022-06-20 19:25:46,454 INFO] Prediction Dist-1 at step 100000: 6.08
[2022-06-20 19:25:46,454 INFO] Prediction Dist-2 at step 100000: 26.23
unknowed-ER commented 2 years ago

This is the model details printed in train.log

Model(
  (graph_embeddings): Embedding(2409, 128, padding_idx=0)
  (encoder): TransformerEncoder(
    (pos_emb): PositionalEncoding(
      (dropout): Dropout(p=0.2, inplace=False)
    )
    (embeddings): Embedding(30522, 768, padding_idx=0)
    (transformer): ModuleList(
      (0): TransformerEncoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.2, inplace=False)
      )
      (1): TransformerEncoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.2, inplace=False)
      )
      (2): TransformerEncoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.2, inplace=False)
      )
      (3): TransformerEncoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.2, inplace=False)
      )
      (4): TransformerEncoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.2, inplace=False)
      )
      (5): TransformerEncoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.2, inplace=False)
      )
    )
    (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
  )
  (concept_encoder): GraphEncoder(
    (embeddings): Embedding(2409, 128, padding_idx=0)
    (emb_2_hid): Sequential(
      (0): Dropout(p=0.3, inplace=False)
      (1): Linear(in_features=128, out_features=768, bias=True)
      (2): ELU(alpha=1.0)
    )
    (rnn_cell): GRUCell(768, 768)
    (attn_layer): GlobalAttention(
      (linear_in): Linear(in_features=768, out_features=768, bias=False)
      (linear_out): Linear(in_features=1536, out_features=768, bias=False)
      (softmax): Softmax(dim=-1)
      (tanh): Tanh()
    )
  )
  (hier_encoder): TransformerEncoder(
    (pos_emb): PositionalEncoding(
      (dropout): Dropout(p=0.2, inplace=False)
    )
    (transformer): ModuleList(
      (0): TransformerEncoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.2, inplace=False)
      )
      (1): TransformerEncoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.2, inplace=False)
      )
      (2): TransformerEncoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.2, inplace=False)
      )
    )
    (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
  )
  (concept_decoder): TransformerDecoder(
    (embeddings): Embedding(2409, 128, padding_idx=0)
    (emb_to_hid): Linear(in_features=128, out_features=768, bias=True)
    (pos_emb): PositionalEncoding(
      (dropout): Dropout(p=0.2, inplace=False)
    )
    (transformer_layers): ModuleList(
      (0): TransformerDecoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (context_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (layer_norm_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (drop): Dropout(p=0.2, inplace=False)
      )
      (1): TransformerDecoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (context_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (layer_norm_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (drop): Dropout(p=0.2, inplace=False)
      )
    )
    (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
  )
  (decoder): InsertionTransformerDecoder(
    (embeddings): Embedding(30522, 768, padding_idx=0)
    (pos_emb): PositionalEncoding(
      (dropout): Dropout(p=0.2, inplace=False)
    )
    (transformer_layers): ModuleList(
      (0): TransformerDecoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (context_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (layer_norm_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (drop): Dropout(p=0.2, inplace=False)
      )
      (1): TransformerDecoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (context_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (layer_norm_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (drop): Dropout(p=0.2, inplace=False)
      )
      (2): TransformerDecoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (context_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (layer_norm_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (drop): Dropout(p=0.2, inplace=False)
      )
      (3): TransformerDecoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (context_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (layer_norm_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (drop): Dropout(p=0.2, inplace=False)
      )
      (4): TransformerDecoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (context_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (layer_norm_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (drop): Dropout(p=0.2, inplace=False)
      )
      (5): TransformerDecoderLayer(
        (self_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (context_attn): MultiHeadedAttention(
          (linear_keys): Linear(in_features=768, out_features=768, bias=True)
          (linear_values): Linear(in_features=768, out_features=768, bias=True)
          (linear_query): Linear(in_features=768, out_features=768, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.2, inplace=False)
          (final_linear): Linear(in_features=768, out_features=768, bias=True)
        )
        (feed_forward): PositionwiseFeedForward(
          (w_1): Linear(in_features=768, out_features=2048, bias=True)
          (w_2): Linear(in_features=2048, out_features=768, bias=True)
          (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
          (dropout_1): Dropout(p=0.2, inplace=False)
          (dropout_2): Dropout(p=0.2, inplace=False)
        )
        (layer_norm_1): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (layer_norm_2): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
        (drop): Dropout(p=0.2, inplace=False)
      )
    )
    (layer_norm): LayerNorm((768,), eps=1e-06, elementwise_affine=True)
    (pool_out): Linear(in_features=1536, out_features=768, bias=True)
  )
  (concept_generator): GraphGenerator(
    (embeddings): Embedding(2409, 128, padding_idx=0)
    (tail_attn_linear_q): Sequential(
      (0): Linear(in_features=1664, out_features=768, bias=True)
      (1): ELU(alpha=1.0)
    )
    (tail_attn_linear_k): Sequential(
      (0): Dropout(p=0.3, inplace=False)
      (1): Linear(in_features=128, out_features=768, bias=True)
      (2): ELU(alpha=1.0)
    )
    (tail_attn): GlobalAttention(
      (linear_out): Linear(in_features=1536, out_features=768, bias=False)
      (softmax): Softmax(dim=-1)
      (tanh): Tanh()
    )
    (head_attn_linear_q): Sequential(
      (0): Linear(in_features=1536, out_features=768, bias=True)
      (1): ELU(alpha=1.0)
    )
    (head_attn_linear_k): Sequential(
      (0): Dropout(p=0.3, inplace=False)
      (1): Linear(in_features=768, out_features=768, bias=True)
      (2): ELU(alpha=1.0)
    )
    (head_attn): GlobalAttention(
      (linear_out): Linear(in_features=1536, out_features=768, bias=False)
      (softmax): Softmax(dim=-1)
      (tanh): Tanh()
    )
  )
  (generator): Generator(
    (linear): Linear(in_features=768, out_features=30522, bias=True)
    (softmax): LogSoftmax(dim=-1)
  )
)
RowitZou commented 2 years ago

Hi, there. The standard evaluation process is to first validate the model on the validation set and then select the best checkpoint to perform testing on the test set. The checkpoint from step 100000 might suffer from overfitting.

RowitZou commented 2 years ago

Other solutions to improve performance can be referred to #4

unknowed-ER commented 2 years ago

@RowitZou Thanks for your reply and excellent work. I validate the mode by following command.

python ./src/main.py -mode validate -data_path torch_data/persona/persona -log_file logs/persona.val.log -test_all -alpha 0.95 -model_path models/persona -result_path results/persona/persona -test_start_from 10000 -visible_gpus 0 -test_batch_ex_size 50 -graph_emb_path graph_data/persona/graph_embedding.npy -tokenizer bert-base-uncased

However, it seems that the checkpoint of step 100000 has the best result on valid dataset. The log is following.

[2022-06-23 16:38:22,930 INFO] Step 49: processing models/persona/model_step_100000.pt
[2022-06-23 16:38:22,930 INFO] Loading checkpoint from models/persona/model_step_100000.pt
[2022-06-23 16:38:27,881 INFO] Loading dev dataset from torch_data/persona/persona.dev.0.pt, number of examples: 2000
[2022-06-23 16:39:02,552 INFO] Loading dev dataset from torch_data/persona/persona.dev.1.pt, number of examples: 2000
[2022-06-23 16:39:37,581 INFO] Loading dev dataset from torch_data/persona/persona.dev.2.pt, number of examples: 1333
[2022-06-23 16:40:01,599 INFO] Concept Predict Score at step 100000: 
>> P/R/F1: 52.85/45.48/48.89
[2022-06-23 16:40:01,600 INFO] Gold Length at step 100000: 12.58
[2022-06-23 16:40:01,603 INFO] Prediction Length ratio at step 100000: 0.51
[2022-06-23 16:40:01,604 INFO] Prediction Bleu at step 100000: 3.00
[2022-06-23 16:40:01,604 INFO] Prediction Rouges at step 100000: 
>> ROUGE-F(1/2/l): 23.01/5.14/21.95
ROUGE-R(1/2/l): 17.36/3.88/16.66
ROUGE-P(1/2/l): 38.40/8.86/35.78
[2022-06-23 16:40:01,604 INFO] Prediction Dist-1 at step 100000: 6.77
[2022-06-23 16:40:01,604 INFO] Prediction Dist-2 at step 100000: 31.94

[2022-06-23 16:40:01,606 INFO] Current step: 100000
[2022-06-23 16:40:01,606 INFO] Dev results: bleu-dist1-dist2: 3.0045, 6.7692, 31.9422
[2022-06-23 16:40:01,606 INFO] Best step: 100000
[2022-06-23 16:40:01,606 INFO] Best dev results: bleu-dist1-dist2: 3.0045, 6.7692, 31.9422

I want to know whether the results in the paper are trained through the commands in readme.md . I attach the processed dataset file.

unknowed-ER commented 2 years ago

dataset of persona : https://pan.baidu.com/s/1xNhUIGEF6HUJPXJ8PfgUvg?pwd=3xy1