PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.07k stars 2.93k forks source link

ernie3.0量化过程报错:Hint: Expected dtype() == paddle::experimental::CppTypeToDataType<T>::Type() #2987

Closed Fmaj7 closed 1 year ago

Fmaj7 commented 2 years ago

欢迎您反馈PaddleNLP使用问题,非常感谢您对PaddleNLP的贡献! 在留下您的问题时,辛苦您同步提供如下信息:

LiuChiachi commented 2 years ago

看 报错信息应该具体是这几行,看起来是数据的dtype不匹配

ValueError: (InvalidArgument) The type of data we are trying to retrieve does not match the type of data currently contained in the container.
[Hint: Expected dtype() == paddle::experimental::CppTypeToDataType::Type(), but received dtype():5 != paddle::experimental::CppTypeToDataType::Type():7.] (at ..\paddle\phi\core\dense_tensor.cc:137)

可以先检查一下模型的输入需要的dtype,和dataset/data_loader出来的数据的dtype是否匹配,常见的有int32和int64等~

Fmaj7 commented 2 years ago

万分感谢,改成int32可以了!

Fmaj7 commented 2 years ago

1、data_loader出来的type: {'input_ids': Tensor(shape=[32, 127], dtype=int64, place=Place(gpu:0), stop_gradient=True 2、裁剪的时候dtype配置为int32: elif quantization: input_dir = compress_config.quantization_config.input_dir if input_dir is None: compress_config.quantization_config.input_filename_prefix = "model" input_spec = [ paddle.static.InputSpec(shape=[None, None], dtype="int32"), # input_ids paddle.static.InputSpec(shape=[None, None], dtype="int32") # segment_ids ] 3、量化成功 4、打开 set_dynamic_shape 开关,自动配置动态shape出现新问题,看样子还是那个int64问题: python infer_gpu.py --task_name token_cls --model_path ./msra_ner_quant_infer_model/int8 --shape_info_file dynamic_shape_info.txt --set_dynamic_shape 错误如下: Traceback (most recent call last): File "./deploy/python/infer_gpu.py", line 94, in main() File "./deploy/python/infer_gpu.py", line 82, in main predictor = ErniePredictor(args) File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\deploy\python\ernie_predictor.py", line 296, in init self.set_dynamic_shape(args.max_seq_length, args.batch_size) File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\deploy\python\ernie_predictor.py", line 405, in set_dynamic_shape self.inference_backend.infer(batch) File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\deploy\python\ernie_predictor.py", line 203, in infer self.predictor.run() RuntimeError: (NotFound) Operator (quantize_linear) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < quantize_linear > error]

我的dtype配置int32或int64也不行: `def token_cls_preprocess(self, data: list):

tokenizer + pad

is_split_into_words = False if isinstance(data[0], list): is_split_into_words = True data = self.tokenizer(data, max_length=self.max_seq_length, padding=True, truncation=True, is_split_into_words=is_split_into_words)

input_ids = data["input_ids"]
token_type_ids = data["token_type_ids"]
return {
    "input_ids": np.array(input_ids, dtype="int64"),
    "token_type_ids": np.array(token_type_ids, dtype="int64")
}`
LiuChiachi commented 2 years ago

您好,set_dynamic_shape函数中用的是int64类型自己构造的数据,https://github.com/PaddlePaddle/PaddleNLP/blob/2a4a2fb69f577d9622bdb51ecb44b98a5b0145da/model_zoo/ernie-3.0/deploy/python/ernie_predictor.py#L379 您可以点开详细看一下 不是您的输入数据,可能需要您组网这里统一下dtype

Fmaj7 commented 2 years ago

您好,有点疑惑,请问不是我输入的数据指的是哪个地方输入的,组网统一dtype指的是在函数set_dynamic_shape里面统一吗?我之前尝试过修改set_dynamic_shape里面的dtype,但是出现同样的错误了 补充下:量化过程中出现如下告警,不知有没影响: Wed Aug 10 16:04:28-INFO: Collect quantized variable names ... Wed Aug 10 16:04:28-WARNING: feed is not supported for quantization. Wed Aug 10 16:04:28-WARNING: feed is not supported for quantization. Wed Aug 10 16:04:28-WARNING: scale is not supported for quantization.

LiuChiachi commented 2 years ago
Fmaj7 commented 2 years ago

模型训练:run_msra_ner.py python run_token_cls.py --task_name msra_ner --model_name_or_path ernie-3.0-medium-zh --do_train

裁剪: 1、compress_msra_ner.py 2、compress_trainer.py python compress_msra_ner.py --dataset "msra_ner" --model_name_or_path best_msra_ner_model --output_dir ./

量化:裁剪步骤文件1compress设置:pruning=False, quantization=True,文件2修改dtype为int32(dtype设置为int64会出错):input_spec = [ paddle.static.InputSpec(shape=[None, None], dtype="int32"), # input_ids paddle.static.InputSpec(shape=[None, None], dtype="int32") # segment_ids ] python compress_msra_ner.py --dataset "msra_ner" --model_name_or_path best_msra_ner_model --output_dir ./

部署:ernie_preditctor.py python ./deploy/python/infer_gpu.py --task_name token_cls --model_path ./best_msra_ner_model/compress/hist16/int8 --shape_info_file dynamic_shape_info.txt --set_dynamic_shape

部署发生错误: RuntimeError: (NotFound) Operator (quantize_linear) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < quantize_linear > error]

还有个gpu内存问题: 直接执行部署脚本跑裁剪后的模型,运行结束后gpu内存会释放掉: python infer_gpu.py --task_name token_cls --model_path ./msra_ner_pruned_infer_model/float32 但是,如果启动一个后台服务(http服务),用接口引入 infer_gpu.main执行,接口调用完后gpu内存不会释放,且调用一次叠加一次如:1g->2g...直到内存爆了

现在ernie3的部署只支持seq、token?

LiuChiachi commented 2 years ago

您好,抱歉回复不及时,您试试把compress_trainer.py中的onnx_format参数设为False,为True的情况目前可能还不支持,正在排查中了。

Fmaj7 commented 2 years ago

onnx_format设为False还是出错了

Fmaj7 commented 2 years ago

onnx_format设为False还是出错了

yghstill commented 2 years ago

@Fmaj7 onnx_format设为False,然后重新导出量化模型,预测时的报错信息可以发下吗?

Fmaj7 commented 2 years ago

onnx_format=False,执行量化出错,如下: }C8JML4XIDD1)18U70I K3

yghstill commented 2 years ago

看 报错信息应该具体是这几行,看起来是数据的dtype不匹配

ValueError: (InvalidArgument) The type of data we are trying to retrieve does not match the type of data currently contained in the container.
[Hint: Expected dtype() == paddle::experimental::CppTypeToDataType::Type(), but received dtype():5 != paddle::experimental::CppTypeToDataType::Type():7.] (at ..\paddle\phi\core\dense_tensor.cc:137)

可以先检查一下模型的输入需要的dtype,和dataset/data_loader出来的数据的dtype是否匹配,常见的有int32和int64等~

@Fmaj7 看报错和这个一样,按照这样改下呢?

Fmaj7 commented 2 years ago

已经试过了,dtype设置为int32量化可以通过,设置为int64就报上面的错误,但是当设置为int32通过完成量化后,再执行:python ./deploy/python/infer_gpu.py --task_name token_cls --model_path ./best_msra_ner_model/compress/hist16/int8 --shape_info_file dynamic_shape_info.txt --set_dynamic_shape,则出现以下错误: RuntimeError: (NotFound) Operator (quantize_linear) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < quantize_linear > error]

yghstill commented 2 years ago

quantize_linear 这个算子是在onnx_format=True下出现的,你需要将dtype设置为int32,同时onnx_format=False

Fmaj7 commented 2 years ago

是的,dtype=int32,onnx_format=False可以通过量化(实际上我测试的时候只设置dtype=int32就通过量化了),但是上面--set_dynamic_shape又出错了,如下: RuntimeError: (NotFound) Operator (fake_quantize_dequantize_moving_average_abs_max) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < fake_quantize_dequantize_moving_average_abs_max > error]

LiuChiachi commented 2 years ago

这个问题可以先将您的ernie_predictor.py中的set_dynamic_shape方法中的int64也都改为int32,应该可以绕过

Fmaj7 commented 2 years ago

不行,前几天试过了,刚刚也试过,这个问题真困惑,是平台不兼容还是其他原因呢! UI(6O6{IKPZMFAEH36FSC2

Fmaj7 commented 2 years ago

改用wsl测试,量化参数:dtype=int64,onnx_frmat=False可以通过量化,但执行--set_dynamic_shape还是不行,ernie_predictor.py里面setdynamic_shape中无论都是int64或int32都不行,错误信息同上

LiuChiachi commented 2 years ago

能够再提供下.pdmodel文件吗。因为 fake_quantize_dequantize_moving_average_abs_max 这个算子在 ERNIE模型下输入确实不应该是 int32

Fmaj7 commented 2 years ago

int8.zip 这个是量化输出的文件,量化compress_train.py参数:dtype=int64,onnx_frmat=False

LiuChiachi commented 2 years ago

请确认将 onnx_format=False,应该是compress_trainer.py这个文件里 PostTrainingQuantization的初始化

Renxs177 commented 2 years ago

您好,请问问题解决了吗?我也遇到相同的问题了

LiuChiachi commented 2 years ago

您好,请问问题解决了吗?我也遇到相同的问题了

您好,把报错截图发出来一起看一下吧

Renxs177 commented 2 years ago

我用的是paddleslim的自动压缩,压缩的策略是执行的离线量化。报的类似的错误。 image

Fmaj7 commented 2 years ago

好像没有解决,我后面用的是in-batch-negative,然后做paddle serving部署,没有做压缩了,检索速度还是蛮快的,gpu训练模型,cpu上面跑检索速度0.3s左右

tianjiahao commented 1 year ago

请问问题解决了没有,也遇到相似的问题。 paddle训练的模型直接进行量化操作(onnx_format=True, dtype=int64),得到量化模型后,进行推理时报错: RuntimeError: (NotFound) Operator (quantize_linear) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < quantize_linear > error]

改为(onnx_format=False, dtype=int64)得到量化模型后推理报错: RuntimeError: (NotFound) Operator (fake_quantize_dequantize_moving_average_abs_max) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)];place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < fake_quantize_dequantize_moving_average_abs_max > error]

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。