ELS-RD / transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
https://els-rd.github.io/transformer-deploy/
Apache License 2.0
1.64k stars 150 forks source link

Encounter Error: ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB #183

Open illumination-k opened 6 months ago

illumination-k commented 6 months ago

Hi, thanks for your great projects!

I tried to build a tensorrt of the embedding model (multilingual-e5-large) with following command onvert_model -m intfloat/multilingual-e5-large --backend tensorrt --task embedding --seq-len 16 512 512 --name intfloat-multilingual-e5-large --device cuda --load-external-data --verbose, but I encountered the following error.

Traceback (most recent call last):
  File "/usr/local/bin/convert_model", line 8, in <module>
    sys.exit(entrypoint())
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 357, in entrypoint
    main(commands=args)
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/convert.py", line 179, in main
    convert_to_onnx(
  File "/usr/local/lib/python3.8/dist-packages/transformer_deploy/backends/pytorch_utils.py", line 158, in convert_to_onnx
    onnx.save(onnx_model, output_path)
  File "/usr/local/lib/python3.8/dist-packages/onnx/__init__.py", line 203, in save_model
    s = _serialize(proto)
  File "/usr/local/lib/python3.8/dist-packages/onnx/__init__.py", line 71, in _serialize
    result = proto.SerializeToString()
ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 2235540927

In transformer-deploy, if proto size is exceeded 2GB, save_as_exceed_data should be true.

https://github.com/ELS-RD/transformer-deploy/blob/6b88e24ade6ce199e825adc0477b28a07f51f17d/src/transformer_deploy/backends/onnx_utils.py#L40-L48

According to onnx API docs, we should use onnx.checker.check_model.

import onnx

onnx.checker.check_model("path/to/the/model.onnx")
# onnx.checker.check_model(loaded_onnx_model) will fail if given >2GB model

The other idea is if load_external_data is true, save_as_external_data should be true.

In the onnx code, they setMAXIMUM_PROTOBUF = 2000000000. I do not understand why this error occurred.

https://github.com/onnx/onnx/blob/238f2b9a41b28e6db0086c8a1be655d517c94dd1/onnx/checker.py#L45-L47

In the onnx, they use sys.getsizeof instead of ByteSize. This is a difference between transformer-deploy and onnx.

https://github.com/onnx/onnx/blob/238f2b9a41b28e6db0086c8a1be655d517c94dd1/onnx/checker.py#L175-L178