Plato model infers error!!! The same config for train process is OK, but it fails for inferrence.

nanzhao commented 3 years ago

aistudio@jupyter-208728-1765888:~/Knover$ git branch -av
  develop                      dcf05a0 Support PaddlePaddle 2.0.
* master                       4bad22c Fix checkpoints and add document for continuous training (#31)
  remotes/origin/HEAD          -> origin/develop
  remotes/origin/develop       dcf05a0 Support PaddlePaddle 2.0.
  remotes/origin/dygraph       5a2fbec Support dygraph in PaddlePaddle 2.0 and add lic2021 baseline
  remotes/origin/luge-dialogue 1b03ac1 update score
  remotes/origin/master        4bad22c Fix checkpoints and add document for continuous training (#31)
  remotes/origin/plato-2       4bad22c Fix checkpoints and add document for continuous training (#31)
aistudio@jupyter-208728-1765888:~/Knover$ python infer.py --model Plato --task DialogGeneration --vocab_path ./projects/lic2021/conf/vocab.txt --spm_model_file ./projects/lic2021/conf/spm.model --infer_file ./data/lic2021/test.txt --data_format numerical --file_format file --config_path ./projects/lic2021/conf/12L_P.json --init_pretraining_params Plato --batch_size 2 --max_src_len 384 --max_tgt_len 128 --max_seq_len 512 --output_name response --decoding_strategy topk_sampling --do_generation True --num_samples 4 --topk 5 --is_cn True --do_generation true --save_path ./projects/lic2021/infer/output --log_step 10 
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
{
  "is_distributed": false,
  "save_path": "./projects/lic2021/infer/output",
  "infer_file": "./data/lic2021/test.txt",
  "output_name": "response",
  "log_steps": 10,
  "Model": {
    "model": "Plato",
    "config_path": "./projects/lic2021/conf/12L_P.json",
    "init_checkpoint": "",
    "init_pretraining_params": "Plato",
    "learning_rate": 1e-05,
    "warmup_steps": 0,
    "weight_decay": 0.0,
    "max_grad_norm": 0.1,
    "use_recompute": false,
    "use_amp": false,
    "amp_loss_scaling": 12800,
    "max_seq_len": 512,
    "weight_sharing": true,
    "mem_efficient": false,
    "use_bow": true,
    "use_entropy": false,
    "pre_encoder_cmd": "d",
    "preprocess_cmd": "n",
    "postprocess_cmd": "da",
    "post_cls_cmd": "n",
    "cls_bias": true,
    "attention_probs_dropout_prob": 0.1,
    "hidden_act": "gelu",
    "hidden_dropout_prob": 0.1,
    "hidden_size": 768,
    "initializer_range": 0.02,
    "max_position_embeddings": 512,
    "latent_type_size": 20,
    "num_attention_heads": 12,
    "num_hidden_layers": 12,
    "type_vocab_size": 2,
    "role_type_size": 32,
    "vocab_size": 30004
  },
  "Generator": {
    "min_dec_len": 1,
    "max_dec_len": 64,
    "decoding_strategy": "topk_sampling",
    "temperature": 1.0,
    "ignore_unk": true,
    "num_samples": 4,
    "topk": 5,
    "topp": 0.9,
    "beam_size": 10,
    "length_average": true,
    "length_penalty": 0.0
  },
  "Task": {
    "task": "DialogGeneration",
    "do_generation": true,
    "is_cn": true,
    "nsp_inference_model_path": null,
    "nsp_attention_style": "bidirectional",
    "ranking_score": "decode_score"
  },
  "Reader": {
    "max_src_len": 384,
    "max_tgt_len": 128,
    "truncate_first_turn": false,
    "file_format": "file",
    "data_format": "numerical",
    "in_tokens": false,
    "batch_size": 2,
    "continuous_position": true,
    "random_seed": 11,
    "sort_pool_size": 65536
  },
  "Tokenizer": {
    "tokenizer": "SentencePieceTokenizer",
    "vocab_path": "./projects/lic2021/conf/vocab.txt",
    "do_lower_case": false,
    "spm_model_file": "./projects/lic2021/conf/spm.model"
  },
  "run_infer": true
}
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/Knover/models/unified_transformer.py:119
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/Knover/models/transformer_block.py:116
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/Knover/models/transformer_block.py:217
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/Knover/models/generator.py:161
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  return (isinstance(seq, collections.Sequence) and
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/Knover/models/generator.py:209
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/Knover/models/generator.py:209
The behavior of expression A / B has been unified with elementwise_div(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_div(X, Y, axis=0) instead of A / B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/Knover/models/generator.py:239
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:298: UserWarning: /home/aistudio/Knover/models/generator.py:239
The behavior of expression A - B has been unified with elementwise_sub(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_sub(X, Y, axis=0) instead of A - B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
W0412 19:20:59.318835  4704 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 10.1
W0412 19:20:59.322726  4704 device_context.cc:372] device: 0, cuDNN Version: 7.6.
Load pretraining parameters from Plato.
Traceback (most recent call last):
  File "infer.py", line 139, in <module>
    infer(args)
  File "infer.py", line 86, in infer
    predictions = task.infer_step(model, data)
  File "/home/aistudio/Knover/tasks/task_base.py", line 43, in infer_step
    predictions = model.infer_step(inputs)
  File "/home/aistudio/Knover/models/plato.py", line 280, in infer_step
    return super(Plato, self).infer_step(inputs)
  File "/home/aistudio/Knover/models/unified_transformer.py", line 439, in infer_step
    predictions = self._run_generation(inputs)
  File "/home/aistudio/Knover/models/unified_transformer.py", line 394, in _run_generation
    return_numpy=False)
  File "/home/aistudio/Knover/models/model_base.py", line 266, in _execute
    fetch_vars = self.exe.run(program, feed, fetch_list, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1110, in run
    six.reraise(*sys.exc_info())
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1108, in run
    return_merged=return_merged)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1238, in _run_impl
    use_program_cache=use_program_cache)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1328, in _run_program
    [fetch_var_name])
ValueError: In user code:

    File "infer.py", line 139, in <module>
      infer(args)
    File "infer.py", line 72, in infer
      model = models.create_model(args, place)
    File "/home/aistudio/Knover/models/__init__.py", line 49, in create_model
      return MODEL_REGISTRY[args.model](args, place)
    File "/home/aistudio/Knover/models/plato.py", line 49, in __init__
      super(Plato, self).__init__(args, place)
    File "/home/aistudio/Knover/models/unified_transformer.py", line 93, in __init__
      super(UnifiedTransformer, self).__init__(args, place)
    File "/home/aistudio/Knover/models/model_base.py", line 74, in __init__
      self._build_programs()
    File "/home/aistudio/Knover/models/model_base.py", line 91, in _build_programs
      predictions = self.infer(inputs, outputs)
    File "/home/aistudio/Knover/models/unified_transformer.py", line 380, in infer
      return self.generator.inference(self, inputs, outputs)
    File "/home/aistudio/Knover/models/generator.py", line 175, in inference
      gather_idx=parent_idx)
    File "/home/aistudio/Knover/models/unified_transformer.py", line 178, in _generation_network
      gather_idx=gather_idx)
    File "/home/aistudio/Knover/models/unified_transformer.py", line 202, in _encode
      store=caches is not None
    File "/home/aistudio/Knover/models/transformer_block.py", line 376, in encoder
      store=store)
    File "/home/aistudio/Knover/models/transformer_block.py", line 288, in encoder_layer
      store=store)
    File "/home/aistudio/Knover/models/transformer_block.py", line 158, in multi_head_attention
      dropout_rate)
    File "/home/aistudio/Knover/models/transformer_block.py", line 116, in scaled_dot_product_attention
      product += attn_bias
    File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py", line 304, in __impl__
      attrs={'axis': axis})
    File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 3023, in append_op
      attrs=kwargs.get("attrs", None))
    File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2107, in __init__
      for frame in traceback.extract_stack():

    InvalidArgumentError: Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [160, 12, 160, 427] and the shape of Y = [160, 12, 1, 268]. Received [427] in X is not equal to [268] in Y at i:3.
      [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.h:160)
      [operator < elementwise_add > error]
aistudio@jupyter-208728-1765888:~/Knover$

nanzhao commented 3 years ago

It seems be cauesed by pos_ids, which has two parts fo informaiton. But they are all put into transformer self-attention, so does pos_ids cause this issue???

Variable: pos_ids

lod: {}
place: CUDAPlace(0)
shape: [160, 266, 1]
layout: NCHW
dtype: long
data: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19] Variable: tmp_38
lod: {}
place: CUDAPlace(0)
shape: [160, 160, 1]
layout: NCHW
dtype: long
data: [266 222 266 222 266 222 266 222 266 222 266 222 266 222 266 222 266 222 266 222]

sserdoubleh commented 3 years ago

Have you change the code?

nanzhao commented 3 years ago

Have you change the code?

except for "paddle.enable_static()", I don't do other code changes.

nanzhao commented 3 years ago

it is the data that I put in static graph. it seems that everything is OK. 160 samples, max sample length 26, sample id, 0, 1.

but I don't understand the code in "_get_feed_dict" in unified_transformer.py:

    feed_dict["token_ids"] = layers.data(name="token_ids", shape=[-1, self.max_seq_len, 1], dtype="int64")
    feed_dict["type_ids"] = layers.data(name="type_ids", shape=[-1, self.max_seq_len, 1], dtype="int64")
    feed_dict["pos_ids"] = layers.data(name="pos_ids", shape=[-1, self.max_seq_len, 1], dtype="int64")
    ... ...

My infererence data format is not same with the code above. Obviously my data isn't padding into max_seq_len. So I am not sure if the inference data format is correct for this model?

I also check the example data in ./data directory, but I didn't find the numerical format inference example file.

so could you give some explaination for inference data fromat?

now my data looks like this: 1 30002 1120 12664 22829 1231 26041 0 2175 27606 5325 26846 26294 26041 0 135 27871 26674 5325 26846 26294 26041 0 135 27871 26674 2286 1231 26041 0 2175 27606 1231 26041 0 2175 27606 1683 1906 4 928 1231 26041 0 2175 27606 5305 8116 26577 119 4 478 26143 17916 1231 26041 0 2175 27606 2907 4484 11421 26166 588 192 588 554 1231 26041 0 2175 27606 2845 112 1231 26041 0 2175 27606 1833 5353 1231 26041 0 2175 27606 10198 2349 5325 26846 26294 26041 0 135 27871 26674 1683 313 1275 58 5958 65 1011 177 3 175 4 170 26 6735 26041 0 182 33 361 507 7097 5325 26846 26294 26041 0 135 27871 26674 14875 6546 17185 8135 17185 26041 0 23778 26041 0 115 10550 27074 26041 0 17185 671 26386 27064 17185 671 26386 27064 588 4245 90 8872 5325 26846 26294 26041 0 135 27871 26674 2286 1231 26041 0 2175 27606 5325 26846 26294 26041 0 135 27871 26674 2845 90 5325 26846 26294 26041 0 135 27871 26674 1833 2734 5325 26846 26294 26041 0 135 27871 26674 10198 2349 1231 26041 0 2175 27606 1683 300 3403 11942 1621 3 1621 340 17 2518 26041 0 1231 26041 0 2175 27606 5646 204 26339 26144 1231 26041 0 2175 27606 3259 3626 26910 1231 26041 0 2175 27606 1833 4504 2 10 59 2349 11 26057 26049 46 11942 4 3420 77 12 2 519 31 3 563 2654 6 40 7 3 340 10587 856 26041 0 2;0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 1 30002 1120 12664 22829 2376 26834 1253 26043 27176 26680 3801 26753 27889 27949 2376 26834 1253 26043 27176 26680 4504 450 26041 0 1154 26649 3801 26753 27889 27949 4504 450 26041 0 1154 26649 2376 26834 1253 26043 27176 26680 2679 1845 2376 26834 1253 26043 27176 26680 2912 26290 699 26739 364 6875 21 162 361 4 3610 2376 26834 1253 26043 27176 26680 2912 26290 5646 231 26339 26123 2376 26834 1253 26043 27176 26680 10198 789 2376 26834 1253 26043 27176 26680 4504 450 26041 0 1154 26649 2376 26834 1253 26043 27176 26680 3241 4484 1731 319 118 26156 5040 3801 26753 27889 27949 2679 1845 3801 26753 27889 27949 3241 4484 1731 319 118 26156 5040 3801 26753 27889 27949 6655 6655 519 3801 26753 27889 27949 10198 789 3801 26753 27889 27949 4504 450 26041 0 1154 26649 2376 26834 1253 26043 27176 26680 2912 26290 699 26739 1218 27176 4 9714 3 2896 4 67 27256 2376 26834 1253 26043 27176 26680 6655 6655 6 26238 2 10 913 4504 450 26041 0 1154 26649 4 2710 22 12 2 6 913 26041 0 2 28 10 21 50 26041 0 2376 26834 1253 26043 27176 26680 26041 0 22 12 2 16 26060 26088 3 5 42 195 26041 0 2;0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0;0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221

nanzhao commented 3 years ago

I infer the example data " ./data/valid_filelist", and see the similar code.

(jddc_2020) ➜  Knover-master git:(master) ✗ python infer.py \
--model Plato --task DialogGeneration --vocab_path ./projects/lic2021/conf/vocab.txt --spm_model_file ./projects/lic2021/conf/spm.model \
--infer_file ./data/valid_filelist --data_format raw --file_format filelist --config_path ./projects/lic2021/conf/12L_P.json \
--batch_size 2 \
--max_src_len 384 --max_tgt_len 128 --max_seq_len 512 \
--output_name response \
--decoding_strategy topk_sampling \
--do_generation True --num_samples 4 --topk 5 --is_cn True \
--do_generation true --save_path ./projects/lic2021/infer/output --log_step 10
{
  "is_distributed": false,
  "save_path": "./projects/lic2021/infer/output",
  "infer_file": "./data/valid_filelist",
  "output_name": "response",
  "log_steps": 10,
  "Model": {
    "model": "Plato",
    "config_path": "./projects/lic2021/conf/12L_P.json",
    "init_checkpoint": "",
    "init_pretraining_params": "",
    "learning_rate": 1e-05,
    "warmup_steps": 0,
    "weight_decay": 0.0,
    "max_grad_norm": 0.1,
    "use_recompute": false,
    "use_amp": false,
    "amp_loss_scaling": 12800,
    "max_seq_len": 512,
    "weight_sharing": true,
    "mem_efficient": false,
    "use_bow": true,
    "use_entropy": false,
    "pre_encoder_cmd": "d",
    "preprocess_cmd": "n",
    "postprocess_cmd": "da",
    "post_cls_cmd": "n",
    "cls_bias": true,
    "attention_probs_dropout_prob": 0.1,
    "hidden_act": "gelu",
    "hidden_dropout_prob": 0.1,
    "hidden_size": 768,
    "initializer_range": 0.02,
    "max_position_embeddings": 512,
    "latent_type_size": 20,
    "num_attention_heads": 12,
    "num_hidden_layers": 12,
    "type_vocab_size": 2,
    "role_type_size": 32,
    "vocab_size": 30004
  },
  "Generator": {
    "min_dec_len": 1,
    "max_dec_len": 64,
    "decoding_strategy": "topk_sampling",
    "temperature": 1.0,
    "ignore_unk": true,
    "num_samples": 4,
    "topk": 5,
    "topp": 0.9,
    "beam_size": 10,
    "length_average": true,
    "length_penalty": 0.0
  },
  "Task": {
    "task": "DialogGeneration",
    "do_generation": true,
    "is_cn": true,
    "nsp_inference_model_path": null,
    "nsp_attention_style": "bidirectional",
    "ranking_score": "decode_score"
  },
  "Reader": {
    "max_src_len": 384,
    "max_tgt_len": 128,
    "truncate_first_turn": false,
    "file_format": "filelist",
    "data_format": "raw",
    "in_tokens": false,
    "batch_size": 2,
    "continuous_position": true,
    "random_seed": 11,
    "sort_pool_size": 65536
  },
  "Tokenizer": {
    "tokenizer": "SentencePieceTokenizer",
    "vocab_path": "./projects/lic2021/conf/vocab.txt",
    "do_lower_case": false,
    "spm_model_file": "./projects/lic2021/conf/spm.model"
  },
  "run_infer": true
}
/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/paddle/fluid/layers/math_op_patch.py:293: UserWarning: /Users/zhaonan8/github_project/Knover-master/models/unified_transformer.py:120
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  warnings.warn(
/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/paddle/fluid/layers/math_op_patch.py:293: UserWarning: /Users/zhaonan8/github_project/Knover-master/models/transformer_block.py:116
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  warnings.warn(
/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/paddle/fluid/layers/math_op_patch.py:293: UserWarning: /Users/zhaonan8/github_project/Knover-master/models/transformer_block.py:217
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  warnings.warn(
/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/paddle/fluid/layers/math_op_patch.py:293: UserWarning: /Users/zhaonan8/github_project/Knover-master/models/generator.py:156
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  warnings.warn(
/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
  return (isinstance(seq, collections.Sequence) and
/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/paddle/fluid/layers/math_op_patch.py:293: UserWarning: /Users/zhaonan8/github_project/Knover-master/models/generator.py:209
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
  warnings.warn(
/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/paddle/fluid/layers/math_op_patch.py:293: UserWarning: /Users/zhaonan8/github_project/Knover-master/models/generator.py:209
The behavior of expression A / B has been unified with elementwise_div(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_div(X, Y, axis=0) instead of A / B. This transitional warning will be dropped in the future.
  warnings.warn(
/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/paddle/fluid/layers/math_op_patch.py:293: UserWarning: /Users/zhaonan8/github_project/Knover-master/models/generator.py:239
The behavior of expression A * B has been unified with elementwise_mul(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_mul(X, Y, axis=0) instead of A * B. This transitional warning will be dropped in the future.
  warnings.warn(
/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/paddle/fluid/layers/math_op_patch.py:293: UserWarning: /Users/zhaonan8/github_project/Knover-master/models/generator.py:239
The behavior of expression A - B has been unified with elementwise_sub(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_sub(X, Y, axis=0) instead of A - B. This transitional warning will be dropped in the future.
  warnings.warn(
Variable: pos_ids
  - lod: {}
  - place: CPUPlace
  - shape: [160, 272, 1]
  - layout: NCHW
  - dtype: long long
  - data: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]

Variable: tmp_38
  - lod: {}
  - place: CPUPlace
  - shape: [160, 160, 1]
  - layout: NCHW
  - dtype: long long
  - data: [237 272 237 272 237 272 237 272 237 272 237 272 237 272 237 272 237 272 237 272]
Traceback (most recent call last):
  File "infer.py", line 142, in <module>
    infer(args)
  File "infer.py", line 89, in infer
    predictions = task.infer_step(model, data)
  File "/Users/zhaonan8/github_project/Knover-master/tasks/task_base.py", line 43, in infer_step
    predictions = model.infer_step(inputs)
  File "/Users/zhaonan8/github_project/Knover-master/models/plato.py", line 280, in infer_step
    return super(Plato, self).infer_step(inputs)
  File "/Users/zhaonan8/github_project/Knover-master/models/unified_transformer.py", line 442, in infer_step
    predictions = self._run_generation(inputs)
  File "/Users/zhaonan8/github_project/Knover-master/models/unified_transformer.py", line 395, in _run_generation
    outputs = self._execute(
  File "/Users/zhaonan8/github_project/Knover-master/models/model_base.py", line 266, in _execute
    fetch_vars = self.exe.run(program, feed, fetch_list, **kwargs)
  File "/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1110, in run
    six.reraise(*sys.exc_info())
  File "/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/six.py", line 703, in reraise
    raise value
  File "/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1098, in run
    return self._run_impl(
  File "/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1230, in _run_impl
    return self._run_program(
  File "/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1327, in _run_program
    self._default_executor.run(program.desc, scope, 0, True, True,
ValueError: In user code:

    File "infer.py", line 142, in <module>
      infer(args)
    File "infer.py", line 75, in infer
      model = models.create_model(args, place)
    File "/Users/zhaonan8/github_project/Knover-master/models/__init__.py", line 49, in create_model
      return MODEL_REGISTRY[args.model](args, place)
    File "/Users/zhaonan8/github_project/Knover-master/models/plato.py", line 49, in __init__
      super(Plato, self).__init__(args, place)
    File "/Users/zhaonan8/github_project/Knover-master/models/unified_transformer.py", line 94, in __init__
      super(UnifiedTransformer, self).__init__(args, place)
    File "/Users/zhaonan8/github_project/Knover-master/models/model_base.py", line 74, in __init__
      self._build_programs()
    File "/Users/zhaonan8/github_project/Knover-master/models/model_base.py", line 91, in _build_programs
      predictions = self.infer(inputs, outputs)
    File "/Users/zhaonan8/github_project/Knover-master/models/unified_transformer.py", line 385, in infer
      return self.generator.inference(self, inputs, outputs)
    File "/Users/zhaonan8/github_project/Knover-master/models/generator.py", line 170, in inference
      dec_out, _ = model._generation_network(
    File "/Users/zhaonan8/github_project/Knover-master/models/unified_transformer.py", line 181, in _generation_network
      return self._encode(
    File "/Users/zhaonan8/github_project/Knover-master/models/unified_transformer.py", line 186, in _encode
      return encoder(
    File "/Users/zhaonan8/github_project/Knover-master/models/transformer_block.py", line 357, in encoder
      enc_output, cps = encoder_layer(
    File "/Users/zhaonan8/github_project/Knover-master/models/transformer_block.py", line 269, in encoder_layer
      attn_output = multi_head_attention(
    File "/Users/zhaonan8/github_project/Knover-master/models/transformer_block.py", line 157, in multi_head_attention
      ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key,
    File "/Users/zhaonan8/github_project/Knover-master/models/transformer_block.py", line 116, in scaled_dot_product_attention
      product += attn_bias
    File "/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/paddle/fluid/layers/math_op_patch.py", line 299, in __impl__
      current_block(self).append_op(
    File "/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/paddle/fluid/framework.py", line 3017, in append_op
      op = Operator(
    File "/Users/zhaonan8/.conda/envs/jddc_2020/lib/python3.8/site-packages/paddle/fluid/framework.py", line 2107, in __init__
      for frame in traceback.extract_stack():

    InvalidArgumentError: Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [160, 12, 160, 433] and the shape of Y = [160, 12, 1, 274]. Received [433] in X is not equal to [274] in Y at i:3.
      [Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at /home/teamcity/work/ef54dc8a5b211854/paddle/fluid/operators/elementwise/elementwise_op_function.h:160)
      [operator < elementwise_add > error]

nanzhao commented 3 years ago

I run my code in master branch with paddlepaddle 2.0.1.

sserdoubleh commented 3 years ago

If you use master branch, may be you need to use paddlepaddle 1.8.x If you want to use paddlepaddle 2.0.x, you need to change to develop branch. Because PaddlePadle 2.0 change the default behavior of elementwise_add.

sserdoubleh commented 3 years ago

The shape in _get_feed_dict can be different from input tensor. The max length of input data can smaller than max_seq_len

nanzhao commented 3 years ago

If you use master branch, may be you need to use paddlepaddle 1.8.x If you want to use paddlepaddle 2.0.x, you need to change to develop branch. Because PaddlePadle 2.0 change the default behavior of elementwise_add.

my isssue should not be caused by elementwise_xxx behavior. I guess the inference function in this code repository isn't tested before. So how can I run a plato-2 inference example? My plato-2 training process is ok, but I can't inference untill now.

nanzhao commented 3 years ago

If you use master branch, may be you need to use paddlepaddle 1.8.x If you want to use paddlepaddle 2.0.x, you need to change to develop branch. Because PaddlePadle 2.0 change the default behavior of elementwise_add.

my isssue should not be caused by elementwise_xxx behavior. I guess the inference function in this code repository isn't tested before. So how can I run a plato-2 inference example? My plato-2 training process is ok, but I can't inference untill now.

sserdoubleh commented 3 years ago

I test master branch for PaddlePaddle 1.8.x, pls read README.md! If you want to use PaddlePaddle 2.0.x, pls use develop branch!

nanzhao commented 3 years ago

in master branch, the README.md dosen't cover inference and it only has "Basic usage Training". so could you share how to inference with example data (./data/valid_filelist) even though it doesn't load any pre-train parameters? So I can use them test in my enviornment.

sserdoubleh commented 3 years ago

Where are you know PLATO? may be https://github.com/PaddlePaddle/Knover/tree/master/plato-2？

nanzhao commented 3 years ago

I reference this doc https://zhuanlan.zhihu.com/p/292013818

sserdoubleh commented 3 years ago

You can run with PaddlePaddle 1.8.x

sserdoubleh commented 3 years ago

I recommend you use develop branch instead.

nanzhao commented 3 years ago

I recommend you use develop branch instead.

I use the develop branch to train and infer the numerical format file with paddle 2.0.1, but it still has some problems when do inferring.

so I copy some code from dygraph branch as below:

diff --git a/knover/data/dialog_reader.py b/knover/data/dialog_reader.py
index cca0ffe..ea8d4d2 100644
--- a/knover/data/dialog_reader.py
+++ b/knover/data/dialog_reader.py
@@ -333,7 +333,11 @@ class DialogReader(object):
             cols = list(map(lambda x: list(map(int, x.split(" "))), cols))
             if len(cols) > self.num_numerical_fields:
                 cols = cols[:self.num_numerical_fields]
-            tgt_start_idx = cols[0].index(self.bos_id, 1)
+            #tgt_start_idx = cols[0].index(self.bos_id, 1)
+            try:
+                tgt_start_idx = cols[0].index(self.bos_id, 1)
+            except ValueError:
+                tgt_start_idx = len(cols[0])
             record = self.Record(*cols, tgt_start_idx=tgt_start_idx, data_id=i)
             yield record

diff --git a/knover/tasks/dialog_generation.py b/knover/tasks/dialog_generation.py
index 3d3c282..fcbdb22 100644
--- a/knover/tasks/dialog_generation.py
+++ b/knover/tasks/dialog_generation.py
@@ -96,11 +96,19 @@ class DialogGeneration(Task):

         predictions = []
         for data_id in group:
-            example = self.reader.features[data_id]
+
+            try:
+                example = self.reader.features[data_id]
+            except:
+                example = None
+            
             preds = group[data_id]
             for pred in preds:
                 # TODO: fix tokenized input

After these changes, the reference seems work well now. so if develop branch also need these changes?

sserdoubleh commented 3 years ago

You also need to modify following lines for inference with numerical data format: https://github.com/PaddlePaddle/Knover/blob/5a2fbec7eda7011d6aa6302851c4da37fa3d2fc4/knover/tasks/dialog_generation.py#L107

nanzhao commented 3 years ago

You also need to modify bellow lines for inference with numerical data format: https://github.com/PaddlePaddle/Knover/blob/5a2fbec7eda7011d6aa6302851c4da37fa3d2fc4/knover/tasks/dialog_generation.py#L107

yes, I have already change this place, but forget to list it in last comment.

sserdoubleh commented 3 years ago

More featuress are developing now and will release latter.

PaddlePaddle / Knover

Plato model infers error!!! The same config for train process is OK, but it fails for inferrence. #45