ELS-RD / transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
https://els-rd.github.io/transformer-deploy/
Apache License 2.0
1.64k stars 150 forks source link

Unable To use batching #180

Open rahulmate opened 11 months ago

rahulmate commented 11 months ago

When converting the model I am getting below error. Command used.

bash -c "cd /project && \
    convert_model -m \"cardiffnlp/twitter-roberta-base-sentiment\" \
    --backend tensorrt onnx \
    --batch-size 8 8 8\
    --seq-len 4 512 512"

Error

Traceback (most recent call last):
  File "/usr/local/bin/convert_model", line 8, in <module>
    sys.exit(entrypoint())
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 574, in entrypoint
    main(commands=args)
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 448, in main
    [
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 449, in <listcomp>
    optimize_onnx(
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/backends/ort_utils.py", line 117, in optimize_onnx
    optimized_model: BertOnnxModel = optimizer.optimize_model(
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/optimizer.py", line 253, in optimize_model
    optimizer = optimize_by_fusion(model, model_type, num_heads, hidden_size, optimization_options)
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/optimizer.py", line 153, in optimize_by_fusion
    optimizer.optimize(optimization_options)
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/models/gpt2/../../onnx_model_bert.py", line 352, in optimize
    self.fuse_reshape()
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/models/gpt2/../../onnx_model_bert.py", line 77, in fuse_reshape
    fusion.apply()
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/models/gpt2/../../fusion_base.py", line 46, in apply
    self.fuse(node, input_name_to_nodes, output_name_to_node)
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/models/gpt2/../../fusion_reshape.py", line 171, in fuse
    self.replace_reshape_node(shape, reshape_node, concat_node)
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/models/gpt2/../../fusion_reshape.py", line 21, in replace_reshape_node
    shape_value = np.asarray(shape, dtype=np.int64)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (4,) + inhomogeneous part.
free(): invalid pointer
Aborted (core dumped)
zoltan-fedor commented 6 months ago

The same error occurs even when no batching is used

sudo docker run -it --rm --gpus all \
  -v $PWD/models:/project ghcr.io/els-rd/transformer-deploy:0.6.0 \
  bash -c "pip3 install \".[GPU]\" && cd /project && \
    convert_model -m \"sentence-transformers/multi-qa-mpnet-base-dot-v1\" \
    --backend onnx \
    --task embedding \
    --seq-len 16 128 128"

Error:

Traceback (most recent call last):
  File "/usr/local/bin/convert_model", line 8, in <module>
    sys.exit(entrypoint())
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 574, in entrypoint
    main(commands=args)
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 448, in main
    [
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 449, in <listcomp>
    optimize_onnx(
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/backends/ort_utils.py", line 117, in optimize_onnx
    optimized_model: BertOnnxModel = optimizer.optimize_model(
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/optimizer.py", line 253, in optimize_model
    optimizer = optimize_by_fusion(model, model_type, num_heads, hidden_size, optimization_options)
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/optimizer.py", line 153, in optimize_by_fusion
    optimizer.optimize(optimization_options)
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/models/gpt2/../../onnx_model_bert.py", line 352, in optimize
    self.fuse_reshape()
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/models/gpt2/../../onnx_model_bert.py", line 77, in fuse_reshape
    fusion.apply()
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/models/gpt2/../../fusion_base.py", line 46, in apply
    self.fuse(node, input_name_to_nodes, output_name_to_node)
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/models/gpt2/../../fusion_reshape.py", line 171, in fuse
    self.replace_reshape_node(shape, reshape_node, concat_node)
  File "/usr/local/lib/python3.8/dist-packages/onnxruntime/transformers/models/gpt2/../../fusion_reshape.py", line 21, in replace_reshape_node
    shape_value = np.asarray(shape, dtype=np.int64)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (4,) + inhomogeneous part.
free(): invalid pointer