PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.25k stars 5.59k forks source link

some mkldnn unit-tests fails on PR_CI_WINDOWS #29080

Closed luotao1 closed 3 years ago

luotao1 commented 3 years ago

failed mkldnn unit-test

https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/2134145/job/3048287

https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/2134425/job/3048632

how to fix

https://github.com/PaddlePaddle/Paddle/blob/5e26a15484f2d9e26dd0357a2386ea216e303290/paddle/scripts/paddle_build.bat#L400-L415 After fix, please remove the unit-tests in above lines, and pass the PR_CI_windows and PR_CI_windows_openblas.

paddle-bot-old[bot] commented 3 years ago

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

lidanqing-intel commented 3 years ago

test_activation_mkldnn_op

2020-11-23 02:55:28 test_activation_mkldnn_op failed
2020-11-23 02:55:28  ............E.E.E.E..........................................................
2020-11-23 02:55:28 ======================================================================
2020-11-23 02:55:28 ERROR: test_check_output (test_activation_mkldnn_op.TestMKLDNNGeluBf16Dim2)
2020-11-23 02:55:28 ----------------------------------------------------------------------
2020-11-23 02:55:28 Traceback (most recent call last):
2020-11-23 02:55:28   File "C:\home\workspace\Paddle\build\python\paddle\fluid\tests\unittests\mkldnn\test_activation_mkldnn_op.py", line 95, in test_check_output
2020-11-23 02:55:28     self.check_output_with_place(core.CPUPlace())
2020-11-23 02:55:28   File "C:\home\workspace\Paddle\build\python\paddle\fluid\tests\unittests\op_test.py", line 1049, in check_output_with_place
2020-11-23 02:55:28     outs, fetch_list = self._calc_output(place, no_check_set=no_check_set)
2020-11-23 02:55:28   File "C:\home\workspace\Paddle\build\python\paddle\fluid\tests\unittests\op_test.py", line 671, in _calc_output
2020-11-23 02:55:28     return_numpy=False)
2020-11-23 02:55:28   File "C:\home\workspace\Paddle\build\python\paddle\fluid\executor.py", line 1107, in run
2020-11-23 02:55:28     six.reraise(*sys.exc_info())
2020-11-23 02:55:28   File "C:\Python37\lib\site-packages\six.py", line 703, in reraise
2020-11-23 02:55:28     raise value
2020-11-23 02:55:28   File "C:\home\workspace\Paddle\build\python\paddle\fluid\executor.py", line 1105, in run
2020-11-23 02:55:28     return_merged=return_merged)
2020-11-23 02:55:28   File "C:\home\workspace\Paddle\build\python\paddle\fluid\executor.py", line 1229, in _run_impl
2020-11-23 02:55:28     use_program_cache=use_program_cache)
2020-11-23 02:55:28   File "C:\home\workspace\Paddle\build\python\paddle\fluid\executor.py", line 1319, in _run_program
2020-11-23 02:55:28     [fetch_var_name])
2020-11-23 02:55:28 RuntimeError: could not create a primitive descriptor iterator
2020-11-23 02:55:28 ======================================================================
2020-11-23 02:55:28 ERROR: test_check_output (test_activation_mkldnn_op.TestMKLDNNGeluBf16Dim2Approx)
2020-11-23 02:55:28 ----------------------------------------------------------------------
2020-11-23 02:55:28 Traceback (most recent call last):
2020-11-23 02:55:28   File "C:\home\workspace\Paddle\build\python\paddle\fluid\tests\unittests\mkldnn\test_activation_mkldnn_op.py", line 114, in test_check_output
2020-11-23 02:55:28     self.check_output_with_place(core.CPUPlace())
2020-11-23 02:55:28   File "C:\home\workspace\Paddle\build\python\paddle\fluid\tests\unittests\op_test.py", line 1049, in check_output_with_place
2020-11-23 02:55:28     outs, fetch_list = self._calc_output(place, no_check_set=no_check_set)
2020-11-23 02:55:28   File "C:\home\workspace\Paddle\build\python\paddle\fluid\tests\unittests\op_test.py", line 671, in _calc_output
2020-11-23 02:55:28     return_numpy=False)
2020-11-23 02:55:28   File "C:\home\workspace\Paddle\build\python\paddle\fluid\executor.py", line 1107, in run
2020-11-23 02:55:28     six.reraise(*sys.exc_info())
2020-11-23 02:55:28   File "C:\Python37\lib\site-packages\six.py", line 703, in reraise
2020-11-23 02:55:28     raise value
2020-11-23 02:55:28   File "C:\home\workspace\Paddle\build\python\paddle\fluid\executor.py", line 1105, in run
2020-11-23 02:55:28     return_merged=return_merged)
2020-11-23 02:55:28   File "C:\home\workspace\Paddle\build\python\paddle\fluid\executor.py", line 1229, in _run_impl
2020-11-23 02:55:28     use_program_cache=use_program_cache)
2020-11-23 02:55:28   File "C:\home\workspace\Paddle\build\python\paddle\fluid\executor.py", line 1319, in _run_program
2020-11-23 02:55:28     [fetch_var_name])
lidanqing-intel commented 3 years ago

Hi @arlesniak Could you please check this test_flags_use_mkldnn UT failure?

\r\n\r\n'test_flags_use_mkldnn failed
2020-11-23 02:56:34  F
2020-11-23 02:56:34 ======================================================================
2020-11-23 02:56:34 FAIL: test_flags_use_mkl_dnn (test_flags_use_mkldnn.TestFlagsUseMkldnn)
2020-11-23 02:56:34 ----------------------------------------------------------------------
2020-11-23 02:56:34 Traceback (most recent call last):
2020-11-23 02:56:34   File "C:\home\workspace\Paddle\build\python\paddle\fluid\tests\unittests\mkldnn\test_flags_use_mkldnn.py", line 54, in test_flags_use_mkl_dnn
2020-11-23 02:56:34     encode()) != -1
2020-11-23 02:56:34 AssertionError
2020-11-23 02:56:34 ----------------------------------------------------------------------
2020-11-23 02:56:34 Ran 1 test in 1.113s
2020-11-23 02:56:34 FAILED (failures=1)
arlesniak commented 3 years ago

Hi @lidanqing-intel Due to the logs test_flags_use_mkldnn fails because it's been run on avx2 capable machine and the test expects avx512. The FLAGS to be tested do behave properly, but output comparison is too detailed. I'll make a PR with more general comparison that would solve that.

paddle-bot-old[bot] commented 3 years ago

Are you satisfied with the resolution of your issue?

YES No

luotao1 commented 3 years ago

Another mkldn unit-test fails on PR_CI_WINDOWS:

2020-11-23 01:32:12 [ RUN ] AnalysisPredictor.bf16_gpu_pass_strategy 2020-11-23 01:32:12 I1123 01:32:11.191207 14452 analysis_config.cc:244] CPU does not support BFLOAT16 calculations 2020-11-23 01:32:12 C:\home\workspace\Paddle\paddle\fluid\inference\api\analysis_predictor_tester.cc(496): error: Expected equality of these values: 2020-11-23 01:32:12 config.mkldnn_bfloat16_enabled() 2020-11-23 01:32:12 Which is: false 2020-11-23 01:32:12 true 2020-11-23 01:32:12 [ FAILED ] AnalysisPredictor.bf16_gpu_pass_strategy (0 ms) 2020-11-23 01:32:12 [ RUN ] AnalysisPredictor.bf16_pass_strategy 2020-11-23 01:32:12 [ OK ] AnalysisPredictor.bf16_pass_strategy (0 ms)


@lidanqing-intel @wozna Could you help fix it?
paddle-bot-old[bot] commented 3 years ago

Are you satisfied with the resolution of your issue?

YES No