PaddlePaddle / Knover

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle
Apache License 2.0
673 stars 131 forks source link

ValueError: (InvalidArgument) for PLATO-XL's interact.sh #159

Closed ShangQingTu closed 1 year ago

ShangQingTu commented 1 year ago

Hi, thanks for your very nice work. I have an error when running interact.sh for PLATO-XL. I'm running the script on Ubuntu 18.04 with 3 RTX 3090 24GB GPUs. python version: 3.7.12 knover 0.0.6 paddlepaddle-gpu 2.3.2.post111 CUDA Version: 11.1 cudnn: 8.0

Error information

(plato) tsq@Sakura32:/data/tsq/xiaomu/baselines/Knover/projects/PLATO-XL$ bash ./interact.sh
models/11B is exist.
+ [[ 1 != 1 ]]
+ job_conf=./projects/PLATO-XL/interact.conf
+ source ./projects/PLATO-XL/interact.conf
++ job_script=./scripts/distributed/interact.sh
++ model=UnifiedTransformer
++ task=DialogGeneration
++ vocab_path=./package/dialog_en/vocab.txt
++ spm_model_file=./package/dialog_en/spm.model
++ config_path=./projects/PLATO-XL/11B.json
++ init_params=./projects/PLATO-XL/models/11B
++ log_dir=./projects/PLATO-XL/log
++ export CUDA_VISIBLE_DEVICES=2,5,7
++ CUDA_VISIBLE_DEVICES=2,5,7
++ infer_args='
--use_role true
--position_style relative
--decoding_strategy topk_sampling
--topk 10
--num_samples 20
--use_sharding true
'
+ [[ ./projects/PLATO-XL/log != '' ]]
+ rm './projects/PLATO-XL/log/workerlog.*'
rm: cannot remove './projects/PLATO-XL/log/workerlog.*': No such file or directory
++ dirname ./scripts/local/job.sh
+ export PYTHONPATH=./scripts/local/../..:/data/tsq/xiaomu/baselines/Knover:
+ PYTHONPATH=./scripts/local/../..:/data/tsq/xiaomu/baselines/Knover:
+ ./scripts/distributed/interact.sh ./projects/PLATO-XL/interact.conf
+ [[ 1 == 1 ]]
+ job_conf=./projects/PLATO-XL/interact.conf
+ source ./projects/PLATO-XL/interact.conf
++ job_script=./scripts/distributed/interact.sh
++ model=UnifiedTransformer
++ task=DialogGeneration
++ vocab_path=./package/dialog_en/vocab.txt
++ spm_model_file=./package/dialog_en/spm.model
++ config_path=./projects/PLATO-XL/11B.json
++ init_params=./projects/PLATO-XL/models/11B
++ log_dir=./projects/PLATO-XL/log
++ export CUDA_VISIBLE_DEVICES=2,5,7
++ CUDA_VISIBLE_DEVICES=2,5,7
++ infer_args='
--use_role true
--position_style relative
--decoding_strategy topk_sampling
--topk 10
--num_samples 20
--use_sharding true
'
+ export FLAGS_sync_nccl_allreduce=1
+ FLAGS_sync_nccl_allreduce=1
+ export FLAGS_fuse_parameter_memory_size=64
+ FLAGS_fuse_parameter_memory_size=64
+ [[ ./package/dialog_en/spm.model != '' ]]
+ save_args='--spm_model_file ./package/dialog_en/spm.model '
+ infer_args='--spm_model_file ./package/dialog_en/spm.model 
--use_role true
--position_style relative
--decoding_strategy topk_sampling
--topk 10
--num_samples 20
--use_sharding true
'
+ [[ '' != '' ]]
+ [[ --spm_model_file ./package/dialog_en/spm.model 
--use_role true
--position_style relative
--decoding_strategy topk_sampling
--topk 10
--num_samples 20
--use_sharding true
 =~ --use_sharding true ]]
+ [[ 2,5,7 != '' ]]
+ CUDA_VISIBLE_DEVICE_ARRAY=(${CUDA_VISIBLE_DEVICES//,/ })
+ MP_DEGREE=3
+ infer_args='--spm_model_file ./package/dialog_en/spm.model 
--use_role true
--position_style relative
--decoding_strategy topk_sampling
--topk 10
--num_samples 20
--use_sharding true
 --mp_degree 3'
+ [[ ! -d ./projects/PLATO-XL/models/11B-mp3 ]]
+ init_params=./projects/PLATO-XL/models/11B-mp3
+ fleetrun ./knover/scripts/interact.py --is_distributed true --model UnifiedTransformer --vocab_path ./package/dialog_en/vocab.txt --config_path ./projects/PLATO-XL/11B.json --init_pretraining_params ./projects/PLATO-XL/models/11B-mp3 --spm_model_file ./package/dialog_en/spm.model --use_role true --position_style relative --decoding_strategy topk_sampling --topk 10 --num_samples 20 --use_sharding true --mp_degree 3
LAUNCH INFO 2022-10-02 16:05:44,163 -----------  Configuration  ----------------------
LAUNCH INFO 2022-10-02 16:05:44,163 devices: None
LAUNCH INFO 2022-10-02 16:05:44,164 elastic_level: -1
LAUNCH INFO 2022-10-02 16:05:44,164 elastic_timeout: 30
LAUNCH INFO 2022-10-02 16:05:44,164 gloo_port: 6767
LAUNCH INFO 2022-10-02 16:05:44,164 host: None
LAUNCH INFO 2022-10-02 16:05:44,164 job_id: default
LAUNCH INFO 2022-10-02 16:05:44,164 legacy: False
LAUNCH INFO 2022-10-02 16:05:44,164 log_dir: log
LAUNCH INFO 2022-10-02 16:05:44,164 log_level: INFO
LAUNCH INFO 2022-10-02 16:05:44,164 master: None
LAUNCH INFO 2022-10-02 16:05:44,164 max_restart: 3
LAUNCH INFO 2022-10-02 16:05:44,164 nnodes: 1
LAUNCH INFO 2022-10-02 16:05:44,164 nproc_per_node: None
LAUNCH INFO 2022-10-02 16:05:44,164 rank: -1
LAUNCH INFO 2022-10-02 16:05:44,164 run_mode: collective
LAUNCH INFO 2022-10-02 16:05:44,164 server_num: None
LAUNCH INFO 2022-10-02 16:05:44,164 servers: 
LAUNCH INFO 2022-10-02 16:05:44,164 trainer_num: None
LAUNCH INFO 2022-10-02 16:05:44,164 trainers: 
LAUNCH INFO 2022-10-02 16:05:44,164 training_script: ./knover/scripts/interact.py
LAUNCH INFO 2022-10-02 16:05:44,164 training_script_args: ['--is_distributed', 'true', '--model', 'UnifiedTransformer', '--vocab_path', './package/dialog_en/vocab.txt', '--config_path', './projects/PLATO-XL/11B.json', '--init_pretraining_params', './projects/PLATO-XL/models/11B-mp3', '--spm_model_file', './package/dialog_en/spm.model', '--use_role', 'true', '--position_style', 'relative', '--decoding_strategy', 'topk_sampling', '--topk', '10', '--num_samples', '20', '--use_sharding', 'true', '--mp_degree', '3']
LAUNCH INFO 2022-10-02 16:05:44,164 with_gloo: 0
LAUNCH INFO 2022-10-02 16:05:44,164 --------------------------------------------------
LAUNCH INFO 2022-10-02 16:05:44,172 Job: default, mode collective, replicas 1[1:1], elastic False
LAUNCH INFO 2022-10-02 16:05:44,173 Run Pod: izcvrd, replicas 3, status ready
LAUNCH INFO 2022-10-02 16:05:44,198 Watching Pod: izcvrd, replicas 3, status running
{
  "is_distributed": true,
  "port": 18123,
  "Model": {
    "model": "UnifiedTransformer",
    "config_path": "./projects/PLATO-XL/11B.json",
    "init_checkpoint": "",
    "init_pretraining_params": "./projects/PLATO-XL/models/11B-mp3",
    "optimizer": "AdamW",
    "learning_rate": 1e-05,
    "beta1": 0.9,
    "beta2": 0.999,
    "warmup_steps": 0,
    "lr_scheduler": "noam",
    "max_training_steps": 2000,
    "min_learning_rate": 0,
    "weight_decay": 0.0,
    "max_grad_norm": 0.1,
    "use_recompute": false,
    "checkpointing_every_n_layers": 1,
    "use_amp": false,
    "amp_loss_scaling": 32768.0,
    "use_sharding": true,
    "dp_degree": 1,
    "sharding_degree": 1,
    "mp_degree": 3,
    "pp_degree": 1,
    "weight_sharing": true,
    "mem_efficient": false,
    "use_role": true,
    "pre_encoder_cmd": "d",
    "preprocess_cmd": "n",
    "postprocess_cmd": "da",
    "post_cls_cmd": "n",
    "cls_bias": true,
    "attention_probs_dropout_prob": 0.1,
    "hidden_act": "gelu",
    "hidden_dropout_prob": 0.1,
    "hidden_size": 3072,
    "inner_hidden_size": 18432,
    "initializer_range": 0.01,
    "max_position_embeddings": 1024,
    "num_attention_heads": 32,
    "num_hidden_layers": 72,
    "type_vocab_size": 3,
    "role_type_size": 128,
    "vocab_size": 8001
  },
  "Generator": {
    "min_dec_len": 1,
    "max_dec_len": 64,
    "decoding_strategy": "topk_sampling",
    "temperature": 1.0,
    "ignore_unk": true,
    "num_samples": 20,
    "topk": 10,
    "topp": 0.9,
    "beam_size": 10,
    "length_average": true,
    "length_penalty": 0.0
  },
  "Task": {
    "do_generation": true,
    "is_cn": false,
    "filter_cross_repetition": true,
    "nsp_inference_model_path": null,
    "ranking_score": "decode_score",
    "generate_seed": 11
  },
  "Reader": {
    "max_src_len": 128,
    "max_tgt_len": 128,
    "max_seq_len": 256,
    "max_knowledge_len": 0,
    "knowledge_position": "post_src",
    "knowledge_style": "original",
    "truncate_first_turn": false,
    "file_format": "file",
    "data_format": "raw",
    "in_tokens": false,
    "batch_size": 16,
    "position_style": "relative",
    "random_seed": 11,
    "shuffle_pool_size": 0,
    "sort_pool_size": 65536
  },
  "Tokenizer": {
    "tokenizer": "SentencePieceTokenizer",
    "vocab_path": "./package/dialog_en/vocab.txt",
    "specials_path": "",
    "do_lower_case": false,
    "spm_model_file": "./package/dialog_en/spm.model"
  },
  "run_infer": true
}
/home/tsq/miniconda3/envs/plato/lib/python3.7/site-packages/paddle/fluid/executor.py:400: UserWarning: do not use standalone executor in fleet by default
  warnings.warn("do not use standalone executor in fleet by default")
I1002 16:05:45.244711 32810 nccl_context.cc:83] init nccl context nranks: 3 local rank: 0 gpu id: 0 ring id: 0
W1002 16:05:45.914779 32810 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.1, Runtime API Version: 11.1
W1002 16:05:45.919350 32810 gpu_resources.cc:91] device: 0, cuDNN Version: 8.0.
/home/tsq/miniconda3/envs/plato/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:341: UserWarning: /data/tsq/xiaomu/baselines/Knover/knover/models/unified_transformer.py:146
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
/home/tsq/miniconda3/envs/plato/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py:341: UserWarning: /data/tsq/xiaomu/baselines/Knover/knover/models/unified_transformer.py:155
The behavior of expression A + B has been unified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
Traceback (most recent call last):
  File "./knover/scripts/interact.py", line 164, in <module>
    interact(args)
  File "./knover/scripts/interact.py", line 61, in interact
    model = models.create_model(args, place)
  File "/data/tsq/xiaomu/baselines/Knover/knover/models/__init__.py", line 46, in create_model
    return MODEL_REGISTRY[args.model](args, place)
  File "/data/tsq/xiaomu/baselines/Knover/knover/models/unified_transformer.py", line 101, in __init__
    super(UnifiedTransformer, self).__init__(args, place)
  File "/data/tsq/xiaomu/baselines/Knover/knover/core/model.py", line 146, in __init__
    self._build_programs()
  File "/data/tsq/xiaomu/baselines/Knover/knover/core/model.py", line 212, in _build_programs
    outputs = self.forward(inputs, is_infer=True)
  File "/data/tsq/xiaomu/baselines/Knover/knover/models/unified_transformer.py", line 435, in forward
    gather_idx=inputs.get("parent_idx", None)
  File "/data/tsq/xiaomu/baselines/Knover/knover/models/unified_transformer.py", line 268, in _generation_network
    name="encoder" if name == "" else name)
  File "/data/tsq/xiaomu/baselines/Knover/knover/models/unified_transformer.py", line 310, in _encode
    topo=self.topo
  File "/data/tsq/xiaomu/baselines/Knover/knover/modules/transformer_block.py", line 478, in encoder
    topo=topo)
  File "/data/tsq/xiaomu/baselines/Knover/knover/modules/transformer_block.py", line 379, in encoder_layer
    topo=topo)
  File "/data/tsq/xiaomu/baselines/Knover/knover/modules/transformer_block.py", line 190, in multi_head_attention
    k = layers.concat([select_k, k], axis=1) 
  File "/home/tsq/miniconda3/envs/plato/lib/python3.7/site-packages/paddle/fluid/layers/tensor.py", line 393, in concat
    type='concat', inputs=inputs, outputs={'Out': [out]}, attrs=attrs)
  File "/home/tsq/miniconda3/envs/plato/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 44, in append_op
    return self.main_program.current_block().append_op(*args, **kwargs)
  File "/home/tsq/miniconda3/envs/plato/lib/python3.7/site-packages/paddle/fluid/framework.py", line 3621, in append_op
    attrs=kwargs.get("attrs", None))
  File "/home/tsq/miniconda3/envs/plato/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2764, in __init__
    self.desc.infer_shape(self.block.desc)
ValueError: (InvalidArgument) The 2-th dimension of input[0] and input[1] is expected to be equal.But received input[0]'s shape = [-1, 0, 960], input[1]'s shape = [-1, 256, 1024].
  [Hint: Expected inputs_dims[0][j] == inputs_dims[i][j], but received inputs_dims[0][j]:960 != inputs_dims[i][j]:1024.] (at /paddle/paddle/phi/kernels/funcs/concat_funcs.h:83)
  [operator < concat > error]
LAUNCH INFO 2022-10-02 16:05:47,203 Pod failed
LAUNCH ERROR 2022-10-02 16:05:47,204 Container failed !!!
Container rank 0 status failed cmd ['/home/tsq/miniconda3/envs/plato/bin/python', '-u', './knover/scripts/interact.py', '--is_distributed', 'true', '--model', 'UnifiedTransformer', '--vocab_path', './package/dialog_en/vocab.txt', '--config_path', './projects/PLATO-XL/11B.json', '--init_pretraining_params', './projects/PLATO-XL/models/11B-mp3', '--spm_model_file', './package/dialog_en/spm.model', '--use_role', 'true', '--position_style', 'relative', '--decoding_strategy', 'topk_sampling', '--topk', '10', '--num_samples', '20', '--use_sharding', 'true', '--mp_degree', '3'] code 1 log log/default.izcvrd.0.log 
env {'CONDA_SHLVL': '2', 'LD_LIBRARY_PATH': '/home/user/cuda/lib64/:/usr/local/cuda/lib64', 'LS_COLORS': 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:', 'CONDA_EXE': '/home/tsq/miniconda3/bin/conda', 'SSH_CONNECTION': '166.111.68.66 2395 103.238.162.32 22', 'LESSCLOSE': '/usr/bin/lesspipe %s %s', 'LANG': 'en_US.UTF-8', 'OLDPWD': '/data/tsq/xiaomu/baselines/Knover/projects/PLATO-XL', 'CONDA_PREFIX': '/home/tsq/miniconda3/envs/plato', 'FLAGS_sync_nccl_allreduce': '1', 'S_COLORS': 'auto', '_CE_M': '', 'XDG_SESSION_ID': '63791', 'USER': 'tsq', 'CONDA_PREFIX_1': '/home/tsq/miniconda3', 'CORENLP_HOME': '/data/tsq/corenlp/stanford-corenlp-4.4.0', 'PWD': '/data/tsq/xiaomu/baselines/Knover', 'HOME': '/home/tsq', 'CONDA_PYTHON_EXE': '/home/tsq/miniconda3/bin/python', 'SSH_CLIENT': '166.111.68.66 2395 22', 'CUDA_HOME': '/usr/local/cuda', 'XDG_DATA_DIRS': '/usr/local/share:/usr/share:/var/lib/snapd/desktop', '_CE_CONDA': '', 'FLAGS_fuse_parameter_memory_size': '64', 'CONDA_PROMPT_MODIFIER': '(plato) ', 'SSH_TTY': '/dev/pts/133', 'MAIL': '/var/mail/tsq', 'SHELL': '/bin/bash', 'TERM': 'xterm-256color', 'CUDA_VISIBLE_DEVICES': '2,5,7', 'SHLVL': '4', 'LANGUAGE': 'en_HK:en', 'PYTHONPATH': './scripts/local/../..:/data/tsq/xiaomu/baselines/Knover:', 'data_dir': '/data/tsq/coref', 'LOGNAME': 'tsq', 'XDG_RUNTIME_DIR': '/run/user/1016', 'PATH': '/home/tsq/mongodb/bin:/home/tsq/miniconda3/envs/plato/bin:/home/tsq/miniconda3/condabin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin', 'CONDA_DEFAULT_ENV': 'plato', 'LESSOPEN': '| /usr/bin/lesspipe %s', '_': '/home/tsq/miniconda3/envs/plato/bin/fleetrun', 'CUSTOM_DEVICE_ROOT': '', 'OMP_NUM_THREADS': '1', 'PADDLE_MASTER': '103.238.162.32:44617', 'PADDLE_GLOBAL_SIZE': '3', 'PADDLE_LOCAL_SIZE': '3', 'PADDLE_GLOBAL_RANK': '0', 'PADDLE_LOCAL_RANK': '0', 'PADDLE_TRAINER_ENDPOINTS': '103.238.162.32:54031,103.238.162.32:48471,103.238.162.32:53953', 'PADDLE_CURRENT_ENDPOINT': '103.238.162.32:54031', 'PADDLE_TRAINER_ID': '0', 'PADDLE_TRAINERS_NUM': '3', 'PADDLE_RANK_IN_NODE': '0', 'FLAGS_selected_gpus': '0'}
ified with elementwise_add(X, Y, axis=-1) from Paddle 2.0. If your code works well in the older versions but crashes in this version, try to use elementwise_add(X, Y, axis=0) instead of A + B. This transitional warning will be dropped in the future.
  op_type, op_type, EXPRESSION_MAP[method_name]))
Traceback (most recent call last):
  File "./knover/scripts/interact.py", line 164, in <module>
    interact(args)
  File "./knover/scripts/interact.py", line 61, in interact
    model = models.create_model(args, place)
  File "/data/tsq/xiaomu/baselines/Knover/knover/models/__init__.py", line 46, in create_model
    return MODEL_REGISTRY[args.model](args, place)
  File "/data/tsq/xiaomu/baselines/Knover/knover/models/unified_transformer.py", line 101, in __init__
    super(UnifiedTransformer, self).__init__(args, place)
  File "/data/tsq/xiaomu/baselines/Knover/knover/core/model.py", line 146, in __init__
    self._build_programs()
  File "/data/tsq/xiaomu/baselines/Knover/knover/core/model.py", line 212, in _build_programs
    outputs = self.forward(inputs, is_infer=True)
  File "/data/tsq/xiaomu/baselines/Knover/knover/models/unified_transformer.py", line 435, in forward
    gather_idx=inputs.get("parent_idx", None)
  File "/data/tsq/xiaomu/baselines/Knover/knover/models/unified_transformer.py", line 268, in _generation_network
    name="encoder" if name == "" else name)
  File "/data/tsq/xiaomu/baselines/Knover/knover/models/unified_transformer.py", line 310, in _encode
    topo=self.topo
  File "/data/tsq/xiaomu/baselines/Knover/knover/modules/transformer_block.py", line 478, in encoder
    topo=topo)
  File "/data/tsq/xiaomu/baselines/Knover/knover/modules/transformer_block.py", line 379, in encoder_layer
    topo=topo)
  File "/data/tsq/xiaomu/baselines/Knover/knover/modules/transformer_block.py", line 190, in multi_head_attention
    k = layers.concat([select_k, k], axis=1) 
  File "/home/tsq/miniconda3/envs/plato/lib/python3.7/site-packages/paddle/fluid/layers/tensor.py", line 393, in concat
    type='concat', inputs=inputs, outputs={'Out': [out]}, attrs=attrs)
  File "/home/tsq/miniconda3/envs/plato/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 44, in append_op
    return self.main_program.current_block().append_op(*args, **kwargs)
  File "/home/tsq/miniconda3/envs/plato/lib/python3.7/site-packages/paddle/fluid/framework.py", line 3621, in append_op
    attrs=kwargs.get("attrs", None))
  File "/home/tsq/miniconda3/envs/plato/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2764, in __init__
    self.desc.infer_shape(self.block.desc)
ValueError: (InvalidArgument) The 2-th dimension of input[0] and input[1] is expected to be equal.But received input[0]'s shape = [-1, 0, 960], input[1]'s shape = [-1, 256, 1024].
  [Hint: Expected inputs_dims[0][j] == inputs_dims[i][j], but received inputs_dims[0][j]:960 != inputs_dims[i][j]:1024.] (at /paddle/paddle/phi/kernels/funcs/concat_funcs.h:83)
  [operator < concat > error]
LAUNCH INFO 2022-10-02 16:05:47,204 Exit code 1
+ exit_code=1
+ exit 1
sserdoubleh commented 1 year ago

Because you are using 3 GPUs to interactive, but the model size can't divide by 3. You can try to use 2 GPUs or 4 GPUs.

ShangQingTu commented 1 year ago

Thanks for your reply. I tried using 4 GPUs. It works. good