急：：在paddleSlim1.0.1版本下面重构distill.py内的merge方法以适应多block模型遇到问题

Heuchler7 commented 4 years ago

如题：出于原本的paddleSlim不支持多block模型蒸馏原因，故决定重构merge以支持多block模型，现进度为：所有的teacher模型内vars以被加上新前缀添加到对应block内，但在最后增加teacher内op入student中报错如下：

Traceback (most recent call last):
  File "D:\Program Files\JetBrains\PyCharm Community Edition 2020.2.1\plugins\python-ce\helpers\pydev\pydevd.py", line 1448, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "D:\Program Files\JetBrains\PyCharm Community Edition 2020.2.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/SimpleInterface/JointTrainner.py", line 145, in <module>
    merge(teacher_program,student_train_program,data_name_map,place)
  File "D:/SimpleInterface/JointTrainner.py", line 94, in merge
    type=op.type, inputs=inputs, outputs=outputs, attrs=attrs)
  File "C:\Users\aaa\code\lib\site-packages\paddle\fluid\framework.py", line 2525, in append_op
    attrs=kwargs.get("attrs", None))
  File "C:\Users\aaa\code\lib\site-packages\paddle\fluid\framework.py", line 1877, in __init__
    self.desc.check_attrs()
paddle.fluid.core_avx.EnforceNotMet: 

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
Windows not support stack backtrace yet.

----------------------
Error Message Summary:
----------------------
Error: Cannot get attribute sub_block by type class paddle::framework::BlockDesc * __ptr64, its type is class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > at (D:\1.7.2\paddle\paddle/fluid/framework/attribute.h:42)

改写后的算子加入代码如下：

    for i in range(teacher_program.num_blocks):
        for op in teacher_program.block(i).ops:
            if op.type != 'feed' and op.type != 'fetch':
                inputs = {}
                outputs = {}
                attrs = {}
                for input_name in op.input_names:
                    inputs[input_name] = [
                        teacher_program.block(i).var(in_var_name)
                        for in_var_name in op.input(input_name)
                    ]

                for output_name in op.output_names:
                    outputs[output_name] = [
                        teacher_program.block(i).var(out_var_name)
                        for out_var_name in op.output(output_name)
                    ]
                for attr_name in op.attr_names:
                    attrs[attr_name] = op.attr(attr_name)
                student_program.block(i).append_op(
                    type=op.type, inputs=inputs, outputs=outputs, attrs=attrs)

baiyfbupt commented 4 years ago

student_program.block(i).append_op(

student_program不一定有多个block，可能是这里索引出问题了

Heuchler7 commented 4 years ago

刚仔细看了下，我的student是teacher的简化版，两个的block数相同，甚至block内算子在未做merge前也相同

Heuchler7 commented 4 years ago

后来更正出来是block内算子中从global_block中承接的input变量未被重命名，修复后代码如下： for i in range(teacher_program.num_blocks): for op in teacher_program.block(i).ops: if op.type != 'feed' and op.type != 'fetch': inputs = {} outputs = {} attrs = {} for input_name in op.input_names: for in_var_name in op.input(input_name): if (name_prefix not in in_var_name) and i > 0: in_var_name = name_prefix + in_var_name if in_var_name in teacher_program.block(i).vars.keys(): inputs[input_name] = teacher_program.block(i).var(in_var_name) else: inputs[input_name] = teacher_program.block(0).var(in_var_name)

            for output_name in op.output_names:
                outputs[output_name] = [
                    teacher_program.block(i).var(out_var_name)
                    for out_var_name in op.output(output_name)
                ]
            for attr_name in op.attr_names:
                attrs[attr_name] = op.attr(attr_name)
            try:
                student_program.block(i).append_op(
                    type=op.type, inputs=inputs, outputs=outputs, attrs=attrs)
            except Exception as e:
                print(e)

但在这种情况下出现新的问题：模型中全局block内的两个卷积还有两个rnn Cell 在输入输出name和shape都正确的情况下报如下错误：

C++ Call Stacks (More useful to developers):

Windows not support stack backtrace yet.

Python Call Stacks (More useful to users):

File "C:\Users\aaa\code\lib\site-packages\paddle\fluid\framework.py", line 2525, in append_op attrs=kwargs.get("attrs", None)) File "D:/SimpleInterface/JointTrainner.py", line 100, in merge type=op.type, inputs=inputs, outputs=outputs, attrs=attrs) File "D:/SimpleInterface/JointTrainner.py", line 152, in merge(teacher_program,student_train_program,data_name_map,place)

Error Message Summary:

Error: Due to the settings of padding(0, 0), filter_size(128), dilation(1) and stride(1), the output size is less than 0, please check again. Input_size:64 [Hint: Expected output_size > 0, but received output_size:-63 <= 0:0.] at (D:\1.7.2\paddle\paddle/fluid/operators/conv_op.h:63) [operator < conv2d > error]

C++ Call Stacks (More useful to developers):

Windows not support stack backtrace yet.

Error Message Summary:

Error: Cannot get attribute sub_block by type class paddle::framework::BlockDesc * __ptr64, its type is class std::basic_string<char,struct std::char_traits,class std::allocator > at (D:\1.7.2\paddle\paddle/fluid/framework/attribute.h:42)

C++ Call Stacks (More useful to developers):

Windows not support stack backtrace yet.

Error Message Summary:

Error: Cannot get attribute sub_block by type class paddle::framework::BlockDesc * __ptr64, its type is class std::basic_string<char,struct std::char_traits,class std::allocator > at (D:\1.7.2\paddle\paddle/fluid/framework/attribute.h:42)

C++ Call Stacks (More useful to developers):

Windows not support stack backtrace yet.

Python Call Stacks (More useful to users):

File "C:\Users\aaa\code\lib\site-packages\paddle\fluid\framework.py", line 2525, in append_op attrs=kwargs.get("attrs", None)) File "D:/SimpleInterface/JointTrainner.py", line 100, in merge type=op.type, inputs=inputs, outputs=outputs, attrs=attrs) File "D:/SimpleInterface/JointTrainner.py", line 152, in merge(teacher_program,student_train_program,data_name_map,place)

Error Message Summary:

Error: Due to the settings of padding(0, 0), filter_size(256), dilation(1) and stride(1), the output size is less than 0, please check again. Input_size:64 [Hint: Expected output_size > 0, but received output_size:-191 <= 0:0.] at (D:\1.7.2\paddle\paddle/fluid/operators/conv_op.h:63) [operator < conv2d > error] 自己实在排不出问题了，在这里诚意求助开发大佬

baiyfbupt commented 4 years ago

((input_size + 2 * padding - (dilation * (filter_size - 1) + 1)) / stride + 1) 一般卷积层out shape是这样计算的看报错信息filter_size尺寸为256，这个显然有问题，可以check下op的attr是否匹配有错误的

Heuchler7 commented 4 years ago

好的，我去追溯看看，谢谢您

在2020年11月16日 10:51，Bai Yifan 写道：

((input_size + 2 padding - (dilation (filter_size - 1) + 1)) / stride + 1) 一般卷积层shape是这样计算的看报错信息input_size是一个负值，filter_size尺寸为256，这个显然有问题，可以check下op的attr是否匹配有错误的

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

PaddlePaddle / PaddleSlim