PaddlePaddle / PaddleSlim

PaddleSlim is an open-source library for deep model compression and architecture search.
https://paddleslim.readthedocs.io/zh_CN/latest/
Apache License 2.0
1.56k stars 345 forks source link

自动压缩工具启动报错: #1638

Closed before31 closed 8 months ago

before31 commented 1 year ago

环境

问题重现步骤

  1. 采用ppocr提供的pgnet算法,使用自己的标注数据进行训练,训练脚本: train.py
  2. 训练完成后,导出成inference模型,导出用的脚本:export_model.py,得到了inference.pdiparams、inference.pdiparams.info、inference.pdmodel 3个文件
  3. 使用ACT自动压缩时报错:
len(data) == len(names), but got len(data): 9 and len(names): 1
  File "/usr/local/lib/python3.7/dist-packages/paddleslim/common/dataloader.py", line 45, in wrap_dataloader
    ), f"len(data) == len(names), but got len(data): {len(data)} and len(names): {len(names)}"
  File "/usr/local/lib/python3.7/dist-packages/paddleslim/auto_compression/compressor.py", line 149, in __init__
    self.feed_vars)

image

自动压缩代码:

train_loader = xxxxx...
ac = AutoCompression(
    model_dir="/paddle/model/e2e_sm/inference",
    model_filename="inference.pdmodel",
    params_filename="inference.pdiparams",
    save_dir="/paddle/model/e2e_sm/act",
    config={"QuantPost": {}, "HyperParameterOptimization": {'ptq_algo': ['avg'], 'max_quant_count': 3}},
    ### config={"QuantAware": {}, "Distillation": {}}, ### 如果您的系统为Windows系统, 请使用当前这一行配置
    train_dataloader=train_loader)
ac.compress()

请问应如何进一步排查问题?

zzjjay commented 1 year ago

这个问题是没有正确配置train_dataloader导致的,可以参考ocr模型自动压缩示例进行配置。

重点参考示例的run.py中的 reader_wrapper函数重新封装下train_loader,再传入自动压缩训练。

before31 commented 1 year ago

参照您给的例子修改以后,能运行起来了,但是压缩过程中仍然报错:

W0117 17:15:00.967579 13699 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0117 17:15:00.977397 13699 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
2023-01-17 17:15:47,899-INFO: devices: gpu
2023-01-17 17:17:15,078-INFO: Selected strategies: ['ptq_hpo']
INFO:smac.utils.io.cmd_reader.CMDReader:Output to smac3-output_2023-01-17-09:20:06
INFO:smac.facade.smac_hpo_facade.SMAC4HPO:Optimizing a deterministic scenario for quality without a tuner timeout - will make SMAC deterministic and only evaluate one configuration per iteration!
INFO:smac.initial_design.sobol_design.SobolDesign:Running initial design for 1 configurations
INFO:smac.facade.smac_hpo_facade.SMAC4HPO:<class 'smac.facade.smac_hpo_facade.SMAC4HPO'>
Tue Jan 17 17:20:06-INFO: Load model and set data loader ...
Tue Jan 17 17:20:07-INFO: Collect quantized variable names ...
Sampling stage, Run batch:|██████████████████████████████████████████████| 10/10
Tue Jan 17 17:22:19-INFO: Update the program ...
Adding quant op with weight:|██████████████████████████████████████████| 320/320
Adding quant activation op:|                                             | 1/688
Tue Jan 17 17:22:39-INFO: The quantized model is saved in quant_model_tmp
ERROR:smac.tae.execute_func.ExecuteTAFuncDict:'NoneType' object is not callable
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/paddleslim/quant/post_quant_hpo.py", line 301, in quantize
    emd_loss = eval_quant_model()
  File "/usr/local/lib/python3.7/dist-packages/paddleslim/quant/post_quant_hpo.py", line 226, in eval_quant_model
    out_float = convert_model_out_2_nparr(out_float)
  File "/usr/local/lib/python3.7/dist-packages/paddleslim/quant/post_quant_hpo.py", line 179, in convert_model_out_2_nparr
    out_nparr = np.concatenate(out_list)
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 4 and the array at index 1 has size 37

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/smac/tae/execute_func.py", line 217, in run
    rval = self._call_ta(self._ta, config, obj_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/smac/tae/execute_func.py", line 314, in _call_ta
    return obj(config, **obj_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/paddleslim/quant/post_quant_hpo.py", line 313, in quantize
    feed_target_names, fetch_targets)
TypeError: 'NoneType' object is not callable
2023-01-17 17:22:46,364-INFO: Value for default configuration: 2147483647.00000000
INFO:smac.optimizer.smbo.SMBO:Running initial design
INFO:smac.intensification.intensification.Intensifier:First run, no incumbent provided; challenger is assumed to be the incumbent
Tue Jan 17 17:22:46-INFO: Load model and set data loader ...
Tue Jan 17 17:22:47-INFO: Collect quantized variable names ...
Sampling stage, Run batch:|██████████████████████████████████████████████| 11/11
Tue Jan 17 17:25:03-INFO: Update the program ...
Adding quant op with weight:|██████████████████████████████████████████| 320/320
Adding quant activation op:|                                             | 1/688
Tue Jan 17 17:25:20-INFO: The quantized model is saved in quant_model_tmp
ERROR:smac.tae.execute_func.ExecuteTAFuncDict:'NoneType' object is not callable
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/paddleslim/quant/post_quant_hpo.py", line 301, in quantize
    emd_loss = eval_quant_model()
  File "/usr/local/lib/python3.7/dist-packages/paddleslim/quant/post_quant_hpo.py", line 226, in eval_quant_model
    out_float = convert_model_out_2_nparr(out_float)
  File "/usr/local/lib/python3.7/dist-packages/paddleslim/quant/post_quant_hpo.py", line 179, in convert_model_out_2_nparr
    out_nparr = np.concatenate(out_list)
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 4 and the array at index 1 has size 37

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/smac/tae/execute_func.py", line 217, in run
    rval = self._call_ta(self._ta, config, obj_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/smac/tae/execute_func.py", line 314, in _call_ta
    return obj(config, **obj_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/paddleslim/quant/post_quant_hpo.py", line 313, in quantize
    feed_target_names, fetch_targets)
TypeError: 'NoneType' object is not callable
INFO:smac.stats.stats.Stats:---------------------STATISTICS---------------------
INFO:smac.stats.stats.Stats:Incumbent changed: -1
INFO:smac.stats.stats.Stats:Submitted target algorithm runs: 1 / 3.0
INFO:smac.stats.stats.Stats:Finished target algorithm runs: 1 / 3.0
INFO:smac.stats.stats.Stats:Configurations: 1
INFO:smac.stats.stats.Stats:Used wallclock time: 161.08 / inf sec 
INFO:smac.stats.stats.Stats:Used target algorithm runtime: 160.96 / inf sec
INFO:smac.stats.stats.Stats:----------------------------------------------------
INFO:smac.facade.smac_hpo_facade.SMAC4HPO:Final Incumbent: None
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/.vscode-server/extensions/ms-python.python-2022.20.2/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/root/.vscode-server/extensions/ms-python.python-2022.20.2/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/root/.vscode-server/extensions/ms-python.python-2022.20.2/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/root/.vscode-server/extensions/ms-python.python-2022.20.2/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 322, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/root/.vscode-server/extensions/ms-python.python-2022.20.2/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 136, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/root/.vscode-server/extensions/ms-python.python-2022.20.2/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/paddle/PaddleOCR/act.py", line 57, in <module>
    ac.compress()
  File "/usr/local/lib/python3.7/dist-packages/paddleslim/auto_compression/compressor.py", line 594, in compress
    train_config)
  File "/usr/local/lib/python3.7/dist-packages/paddleslim/auto_compression/compressor.py", line 739, in single_strategy_compress
    runcount_limit=config.max_quant_count)
  File "/usr/local/lib/python3.7/dist-packages/paddleslim/quant/post_quant_hpo.py", line 535, in quant_post_hpo
    incumbent = smac.optimize()
  File "/usr/local/lib/python3.7/dist-packages/smac/facade/smac_ac_facade.py", line 723, in optimize
    incumbent = self.solver.run()
  File "/usr/local/lib/python3.7/dist-packages/smac/optimizer/smbo.py", line 307, in run
    self._incorporate_run_results(run_info, result, time_left)
  File "/usr/local/lib/python3.7/dist-packages/smac/optimizer/smbo.py", line 513, in _incorporate_run_results
    "'abort_on_first_run_crash'). Additional run info: %s" % result.additional_info
smac.tae.FirstRunCrashedException: First run crashed, abort. Please check your setup -- we assume that your default configuration does not crashes. (To deactivate this exception, use the SMAC scenario option 'abort_on_first_run_crash'). Additional run info: {}

另外,您给的参考目录下的readme.md文件内容是缺失的,没有3.4小节。

zzjjay commented 1 year ago

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 4 and the array at index 1 has size 37

看这的报错提示似乎是模型输出大小不一致导致的,具体是哪个模型呀?

也可以尝试下自动压缩量化训练的效果,看看还会不会有输出导致的问题。

另外,您给的参考目录下的readme.md文件内容是缺失的,没有3.4小节。

感谢指出文档的不足~

before31 commented 1 year ago

看这的报错提示似乎是模型输出大小不一致导致的,具体是哪个模型呀?

是ppocr里面的pgnet模型。

也可以尝试下自动压缩量化训练的效果,看看还会不会有输出导致的问题。

我尝试过量化训练,也是走不通。#1628