Closed Fmaj7 closed 1 year ago
看 报错信息应该具体是这几行,看起来是数据的dtype不匹配
ValueError: (InvalidArgument) The type of data we are trying to retrieve does not match the type of data currently contained in the container.
[Hint: Expected dtype() == paddle::experimental::CppTypeToDataType::Type(), but received dtype():5 != paddle::experimental::CppTypeToDataType::Type():7.] (at ..\paddle\phi\core\dense_tensor.cc:137)
可以先检查一下模型的输入需要的dtype,和dataset/data_loader出来的数据的dtype是否匹配,常见的有int32和int64等~
万分感谢,改成int32可以了!
1、data_loader出来的type: {'input_ids': Tensor(shape=[32, 127], dtype=int64, place=Place(gpu:0), stop_gradient=True 2、裁剪的时候dtype配置为int32: elif quantization: input_dir = compress_config.quantization_config.input_dir if input_dir is None: compress_config.quantization_config.input_filename_prefix = "model" input_spec = [ paddle.static.InputSpec(shape=[None, None], dtype="int32"), # input_ids paddle.static.InputSpec(shape=[None, None], dtype="int32") # segment_ids ] 3、量化成功 4、打开 set_dynamic_shape 开关,自动配置动态shape出现新问题,看样子还是那个int64问题: python infer_gpu.py --task_name token_cls --model_path ./msra_ner_quant_infer_model/int8 --shape_info_file dynamic_shape_info.txt --set_dynamic_shape 错误如下: Traceback (most recent call last): File "./deploy/python/infer_gpu.py", line 94, in main() File "./deploy/python/infer_gpu.py", line 82, in main predictor = ErniePredictor(args) File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\deploy\python\ernie_predictor.py", line 296, in init self.set_dynamic_shape(args.max_seq_length, args.batch_size) File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\deploy\python\ernie_predictor.py", line 405, in set_dynamic_shape self.inference_backend.infer(batch) File "D:\AI\PaddleNLP-develop\model_zoo\ernie-3.0\deploy\python\ernie_predictor.py", line 203, in infer self.predictor.run() RuntimeError: (NotFound) Operator (quantize_linear) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < quantize_linear > error]
我的dtype配置int32或int64也不行: `def token_cls_preprocess(self, data: list):
is_split_into_words = False if isinstance(data[0], list): is_split_into_words = True data = self.tokenizer(data, max_length=self.max_seq_length, padding=True, truncation=True, is_split_into_words=is_split_into_words)
input_ids = data["input_ids"]
token_type_ids = data["token_type_ids"]
return {
"input_ids": np.array(input_ids, dtype="int64"),
"token_type_ids": np.array(token_type_ids, dtype="int64")
}`
您好,set_dynamic_shape
函数中用的是int64类型自己构造的数据,https://github.com/PaddlePaddle/PaddleNLP/blob/2a4a2fb69f577d9622bdb51ecb44b98a5b0145da/model_zoo/ernie-3.0/deploy/python/ernie_predictor.py#L379
您可以点开详细看一下
不是您的输入数据,可能需要您组网这里统一下dtype
您好,有点疑惑,请问不是我输入的数据指的是哪个地方输入的,组网统一dtype指的是在函数set_dynamic_shape里面统一吗?我之前尝试过修改set_dynamic_shape里面的dtype,但是出现同样的错误了 补充下:量化过程中出现如下告警,不知有没影响: Wed Aug 10 16:04:28-INFO: Collect quantized variable names ... Wed Aug 10 16:04:28-WARNING: feed is not supported for quantization. Wed Aug 10 16:04:28-WARNING: feed is not supported for quantization. Wed Aug 10 16:04:28-WARNING: scale is not supported for quantization.
Q1: 请问不是我输入的数据指的是哪个地方输入的:
A1:是set_dynamic_shape它会构造数据,这个set_dynamic_shape过程用到的数据和你的输入数据无关,通过代码看它是构造的int64的数据: https://github.com/PaddlePaddle/PaddleNLP/blob/2a4a2fb69f577d9622bdb51ecb44b98a5b0145da/model_zoo/ernie-3.0/deploy/python/ernie_predictor.py#L384-L389
Q2:组网统一dtype
A2:还是需要保证网络希望的输入dtype和你实际给的数据的dtype一致,如果还是不成功,可以发来代码一起看一下
Q3: 量化过程中出现如下告警,不知有没影响:
A3: Warning应该是不会有影响的
模型训练:run_msra_ner.py python run_token_cls.py --task_name msra_ner --model_name_or_path ernie-3.0-medium-zh --do_train
裁剪: 1、compress_msra_ner.py 2、compress_trainer.py python compress_msra_ner.py --dataset "msra_ner" --model_name_or_path best_msra_ner_model --output_dir ./
量化:裁剪步骤文件1compress设置:pruning=False, quantization=True,文件2修改dtype为int32(dtype设置为int64会出错):input_spec = [ paddle.static.InputSpec(shape=[None, None], dtype="int32"), # input_ids paddle.static.InputSpec(shape=[None, None], dtype="int32") # segment_ids ] python compress_msra_ner.py --dataset "msra_ner" --model_name_or_path best_msra_ner_model --output_dir ./
部署:ernie_preditctor.py python ./deploy/python/infer_gpu.py --task_name token_cls --model_path ./best_msra_ner_model/compress/hist16/int8 --shape_info_file dynamic_shape_info.txt --set_dynamic_shape
部署发生错误: RuntimeError: (NotFound) Operator (quantize_linear) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < quantize_linear > error]
还有个gpu内存问题: 直接执行部署脚本跑裁剪后的模型,运行结束后gpu内存会释放掉: python infer_gpu.py --task_name token_cls --model_path ./msra_ner_pruned_infer_model/float32 但是,如果启动一个后台服务(http服务),用接口引入 infer_gpu.main执行,接口调用完后gpu内存不会释放,且调用一次叠加一次如:1g->2g...直到内存爆了
现在ernie3的部署只支持seq、token?
您好,抱歉回复不及时,您试试把compress_trainer.py中的onnx_format
参数设为False,为True的情况目前可能还不支持,正在排查中了。
onnx_format设为False还是出错了
onnx_format设为False还是出错了
@Fmaj7 onnx_format设为False,然后重新导出量化模型,预测时的报错信息可以发下吗?
onnx_format=False,执行量化出错,如下:
看 报错信息应该具体是这几行,看起来是数据的dtype不匹配
ValueError: (InvalidArgument) The type of data we are trying to retrieve does not match the type of data currently contained in the container. [Hint: Expected dtype() == paddle::experimental::CppTypeToDataType::Type(), but received dtype():5 != paddle::experimental::CppTypeToDataType::Type():7.] (at ..\paddle\phi\core\dense_tensor.cc:137)
可以先检查一下模型的输入需要的dtype,和dataset/data_loader出来的数据的dtype是否匹配,常见的有int32和int64等~
@Fmaj7 看报错和这个一样,按照这样改下呢?
已经试过了,dtype设置为int32量化可以通过,设置为int64就报上面的错误,但是当设置为int32通过完成量化后,再执行:python ./deploy/python/infer_gpu.py --task_name token_cls --model_path ./best_msra_ner_model/compress/hist16/int8 --shape_info_file dynamic_shape_info.txt --set_dynamic_shape,则出现以下错误: RuntimeError: (NotFound) Operator (quantize_linear) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < quantize_linear > error]
quantize_linear 这个算子是在onnx_format=True下出现的,你需要将dtype设置为int32,同时onnx_format=False
是的,dtype=int32,onnx_format=False可以通过量化(实际上我测试的时候只设置dtype=int32就通过量化了),但是上面--set_dynamic_shape又出错了,如下: RuntimeError: (NotFound) Operator (fake_quantize_dequantize_moving_average_abs_max) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < fake_quantize_dequantize_moving_average_abs_max > error]
这个问题可以先将您的ernie_predictor.py中的set_dynamic_shape
方法中的int64也都改为int32,应该可以绕过
不行,前几天试过了,刚刚也试过,这个问题真困惑,是平台不兼容还是其他原因呢!
改用wsl测试,量化参数:dtype=int64,onnx_frmat=False可以通过量化,但执行--set_dynamic_shape还是不行,ernie_predictor.py里面setdynamic_shape中无论都是int64或int32都不行,错误信息同上
能够再提供下.pdmodel文件吗。因为 fake_quantize_dequantize_moving_average_abs_max
这个算子在 ERNIE模型下输入确实不应该是 int32
请确认将 onnx_format=False
,应该是compress_trainer.py这个文件里
PostTrainingQuantization
的初始化
您好,请问问题解决了吗?我也遇到相同的问题了
您好,请问问题解决了吗?我也遇到相同的问题了
您好,把报错截图发出来一起看一下吧
我用的是paddleslim的自动压缩,压缩的策略是执行的离线量化。报的类似的错误。
好像没有解决,我后面用的是in-batch-negative,然后做paddle serving部署,没有做压缩了,检索速度还是蛮快的,gpu训练模型,cpu上面跑检索速度0.3s左右
请问问题解决了没有,也遇到相似的问题。 paddle训练的模型直接进行量化操作(onnx_format=True, dtype=int64),得到量化模型后,进行推理时报错: RuntimeError: (NotFound) Operator (quantize_linear) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < quantize_linear > error]
改为(onnx_format=False, dtype=int64)得到量化模型后推理报错: RuntimeError: (NotFound) Operator (fake_quantize_dequantize_moving_average_abs_max) does not have kernel for {data_type[int64_t]; data_layout[Undefined(AnyLayout)];place[Place(gpu:0)]; library_type[PLAIN]}. [Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at ..\paddle\fluid\framework\operator.cc:1712) [operator < fake_quantize_dequantize_moving_average_abs_max > error]
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。
欢迎您反馈PaddleNLP使用问题,非常感谢您对PaddleNLP的贡献! 在留下您的问题时,辛苦您同步提供如下信息: