Open tongshi203 opened 3 years ago
Can you run vainfo
and share the output, please?
Only in certain GPU's there are lower-power capable encoders, but not in every GPU.
For h.264/AVC I can see the following on one of my machines (excerpt):
...
VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
...
VAProfileH264Main : VAEntrypointEncSliceLP
...
VAProfileH264High : VAEntrypointEncSliceLP
...
Do you see those LP
postfixes for h.265/HEVC on your GPU in the output of vainfo
?
@brmarkus ,
It's RocketLake. HEVC LP is supported there.
@tongshi203 , Let's try to narrow down there the issue is. Please run:
./sample_multi_transcode -i::h265 1.h265 -hw -o::h265 new.h265 -hw -la_ext -lowpower:on
./sample_multi_transcode -i::h265 1.h265 -hw -o::h265 new.h265 -hw -cqp -qpi 26 -qpp 26 -qpb 26 -lowpower:on
./sample_multi_transcode -i::h265 1.h265 -hw -o::h265 new.h265 -hw -b 5000 -lowpower:on
and share results.Please also send output from: dmesg | grep drm ls /dev/dri lspci -nn
Please also send dmesg output after MFX_ERR_DEVICE_FAILED error is triggered.
1. [root@localhost release]# ./sample_multi_transcode -i::h265 1.h265 -hw -o::h265 new.h265 -hw -la_ext -lowpower:on Multi Transcoding Sample Version 8.4.27.0
libva info: VA-API version 1.12.0 libva info: Trying to open /usr/lib64/dri/iHD_drv_video.so libva info: Found init function __vaDriverInit_1_12 libva info: va_openDriver() returns 0 Session 0: plugin_loader.h :185 [INFO] Plugin was loaded from GUID: { 0x58, 0x8f, 0x11, 0x85, 0xd4, 0x7b, 0x42, 0x96, 0x8d, 0xea, 0x37, 0x7b, 0xb5, 0xd0, 0xdc, 0xb4 } (Intel (R) Media SDK plugin for LA ENC)
[ERROR], sts=MFX_ERR_UNSUPPORTED(-3), EncodePreInit, m_pmfxENC->Query failed at MediaSDK/samples/sample_multi_transcode/src/pipeline_transcode.cpp:483 [ERROR], sts=MFX_ERR_UNSUPPORTED(-3), Init, EncodePreInit failed at MediaSDK/samples/sample_multi_transcode/src/pipeline_transcode.cpp:3870 [ERROR], sts=MFX_ERR_UNSUPPORTED(-3), Init, pThreadPipeline->pPipeline->Init failed at MediaSDK/samples/sample_multi_transcode/src/sample_multi_transcode.cpp:530 plugin_loader.h :211 [INFO] MFXBaseUSER_UnLoad(session=0x0x5600a44d79a0), sts=0 [ERROR], sts=MFX_ERR_UNSUPPORTED(-3), main, transcode.Init failed at MediaSDK/samples/sample_multi_transcode/src/sample_multi_transcode.cpp:1248
2.[root@localhost release]# ./sample_multi_transcode -i::h265 1.h265 -hw -o::h265 new.h265 -hw -cqp -qpi 26 -qpp 26 -qpb 26 -lowpower:on Multi Transcoding Sample Version 8.4.27.0
libva info: VA-API version 1.12.0 libva info: Trying to open /usr/lib64/dri/iHD_drv_video.so libva info: Found init function __vaDriverInit_1_12 libva info: va_openDriver() returns 0 Session 0: Pipeline surfaces number (DecPool): 14 Input video: HEVC Output video: HEVC
Session 0 was NOT joined with other sessions
Transcoding started
[ERROR], sts=MFX_ERR_DEVICE_FAILED(-17), PutBS, Encode: SyncOperation failed at MediaSDK/samples/sample_multi_transcode/src/pipeline_transcode.cpp:1950 [ERROR], sts=MFX_ERR_ABORTED(-12), Transcode, PutBS failed at MediaSDK/samples/sample_multi_transcode/src/pipeline_transcode.cpp:1912 [ERROR], sts=MFX_ERR_ABORTED(-12), Run, CTranscodingPipeline::Run::Transcode() [0x55b3375ef9a0] failed at MediaSDK/samples/sample_multi_transcode/src/pipeline_transcode.cpp:4584
session 0 [0x55b3375ef9a0] failed with status MFX_ERR_ABORTED shutting down the application...
session [0x55b3375ef9a0] m_bForceStop is set
Transcoding finished
Common transcoding time is 6.75842 sec
session 0 [0x55b3375ef9a0] FAILED (MFX_ERR_ABORTED) 6.75829 sec, 8 frames, 1.184 fps -i::h265 1.h265 -hw -o::h265 new.h265 -hw -cqp -qpi 26 -qpp 26 -qpb 26 -lowpower:on
The test FAILED [ERROR], sts=MFX_ERR_ABORTED(-12), main, transcode.ProcessResult failed at MediaSDK/samples/sample_multi_transcode/src/sample_multi_transcode.cpp:1255
3.[root@localhost release]# ./sample_multi_transcode -i::h265 1.h265 -hw -o::h265 new.h265 -hw -b 5000 -lowpower:on Multi Transcoding Sample Version 8.4.27.0
libva info: VA-API version 1.12.0 libva info: Trying to open /usr/lib64/dri/iHD_drv_video.so libva info: Found init function __vaDriverInit_1_12 libva info: va_openDriver() returns 0 Session 0:
[ERROR], sts=MFX_ERR_UNSUPPORTED(-3), EncodePreInit, m_pmfxENC->Query failed at MediaSDK/samples/sample_multi_transcode/src/pipeline_transcode.cpp:483 [ERROR], sts=MFX_ERR_UNSUPPORTED(-3), Init, EncodePreInit failed at MediaSDK/samples/sample_multi_transcode/src/pipeline_transcode.cpp:3870 [ERROR], sts=MFX_ERR_UNSUPPORTED(-3), Init, pThreadPipeline->pPipeline->Init failed at MediaSDK/samples/sample_multi_transcode/src/sample_multi_transcode.cpp:530 [ERROR], sts=MFX_ERR_UNSUPPORTED(-3), main, transcode.Init failed at MediaSDK/samples/sample_multi_transcode/src/sample_multi_transcode.cpp:1248
4.[root@localhost Desktop]# dmesg | grep drm [ 3.342081] fb0: switching to inteldrmfb from EFI VGA [ 3.343551] i915 0000:00:02.0: [drm] Failed to load DMC firmware i915/rkl_dmc_ver2_02.bin. Disabling runtime power management. [ 3.343551] i915 0000:00:02.0: [drm] DMC firmware homepage: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915 [ 3.349389] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0 [ 3.378448] fbcon: i915drmfb (fb0) is primary device [ 3.434983] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device [ 186.155188] i915 0000:00:02.0: [drm] Resetting vcs0 for preemption time out [ 186.155206] i915 0000:00:02.0: [drm] sample_multi_tr[5317] context reset due to GPU hang [ 186.156356] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:4:28ffff7d, in sample_multi_tr [5317]
[root@localhost Desktop]# ls /dev/dri by-path card0 renderD128
[root@localhost Desktop]# lspci -nn 00:00.0 Host bridge [0600]: Intel Corporation Device [8086:4c53] (rev 01) 00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:4c01] (rev 01) 00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:4c8a] (rev 04) 00:08.0 System peripheral [0880]: Intel Corporation Device [8086:4c11] (rev 01) 00:14.0 USB controller [0c03]: Intel Corporation Device [8086:43ed] (rev 11) 00:14.2 RAM memory [0500]: Intel Corporation Device [8086:43ef] (rev 11) 00:16.0 Communication controller [0780]: Intel Corporation Device [8086:43e0] (rev 11) 00:17.0 SATA controller [0106]: Intel Corporation Device [8086:43d2] (rev 11) 00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:4388] (rev 11) 00:1f.3 Audio device [0403]: Intel Corporation Device [8086:43c8] (rev 11) 00:1f.4 SMBus [0c05]: Intel Corporation Device [8086:43a3] (rev 11) 00:1f.5 Serial bus controller [0c80]: Intel Corporation Device [8086:43a4] (rev 11) 00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (14) I219-V [8086:15fa] (rev 11) 01:00.0 Unassigned class [ff00]: Altera Corporation Device [1172:0008] (rev 01)
Okay, even CQP doesn't work. @thomasli21801, please assign.
Okay, even CQP doesn't work. @thomasli21801, please assign.
Hi Dmitry, I asked Jason to help find someone to take a look since this is a RKL issue.
Thanks, Thomas
Jiao Xu will firstly check if this is tool issue or GPU RT/driver issue.
Hi,@JasonL2011 @Jiao Xu Do you know which CPU/Base Board can support lowpower:on in linux? thanks!
@tongshi203, HEVC encode lowpower support starts from ICL.
Error can be reproduced both using Sample_Multi_transcode and mfx_transcoder, but we have Xcode case passed in CI, need to double confirm.
Hi,@JasonL2011 @Jiao Xu Is there an update on enabling -lowpower:on in h265 encoding? Thanks!
hi @tongshi203 , using our CI environment, cannot reproduce the error, case can pass. the details listed below. Btw, may I know the kernel version you used? Env info: OS: Ubuntu 20.04.1 LTS kernel: 5.12.0-rc4-CI-CI_DRM_9886+ CPU Description : 11th Gen Intel(R) Core(TM) i5-11400 @ 2.60GHz
Would it be possible to share the content of your PAR file, @smc667 please?
Are the command lines mentioned by @dmitryermilov working for you?
./sample_multi_transcode -i::h265 1.h265 -hw -o::h265 new.h265 -hw -la_ext -lowpower:on
./sample_multi_transcode -i::h265 1.h265 -hw -o::h265 new.h265 -hw -cqp -qpi 26 -qpp 26 -qpb 26 -lowpower:on
./sample_multi_transcode -i::h265 1.h265 -hw -o::h265 new.h265 -hw -b 5000 -lowpower:on
Hi,@smc667 My development environment are: OS: Centos8.2 kernel: 5.12.0-rc8+ CPU:Rocket Lake 11th Gen Intel(R) Core(TM) i5-11500 @ 2.70GHz Base Board: H510M-ITX/ac
@smc667
Env info: OS: Ubuntu 20.04.1 LTS kernel: 5.12.0-rc4-CI-CI_DRM_9886+ CPU Description : 11th Gen Intel(R) Core(TM) i5-11400 @ 2.60GHz
Is 5.12.0-rc4-CI-CI_DRM_9886+ publicly available? If so, please point the link to the kernel. If not, please try to reproduce the issue on 5.12.0-rc8.
@smc667 hi, Is there any update?
hi @tongshi203 and @dmitryermilov , sorry for late response, we can reproduce both in 5.12.0-rc4 and 5.12.0-rc8 publicly kernel, now @aidan2020sh is working on this issue.
DRM_10153( pass ): http://gtax-gcmxd-fm.intel.com/#/jobs/43538722 DRM_10154( fail ): http://gtax-gcmxd-fm.intel.com/#/jobs/43540529 currently media weekly CI test version is DRM_9886.
@smc667,@aidan2020sh hi, Is there any update? or,What should we do now? thanks!!
DRM_10154 incurred an va_init failure which has been resolved; however, the latest KMD driver shows a new "GPU hang" issue in pipeline execution. I would suggest you continue your work with DRM_10153 or an earlier version that may not block you.
@tongshi203 , could you confirm that your case can pass with "lowpower:off" ? the failure may not be related with lowpower option. i tried one media unit test case with "lowpower off", it also failed.
@tongshi203 , the media unit test can pass if both i915 kernel driver and firmware binary are updated. if only kernel driver is updated and firmware is too old, GPU hang will be reproduced. for example, in my manual test, "DRM_10399+ firmware_1627348338"(latest) or "DRM_9886+ firmware_1614809303"(Media CI validation version) can work well. Please have a try.
@aidan2020sh My case with "lowpower:off" is ok!
@aidan2020sh my kernel is: 5.12.0-rc8+, and my case is not GPU hang, it is error like this:
[ERROR], sts=MFX_ERR_DEVICE_FAILED(-17), PutBS, Encode: SyncOperation failed at MediaSDK/samples/sample_multi_transcode/src/pipeline_transcode.cpp:1950 [ERROR], sts=MFX_ERR_ABORTED(-12), Transcode, PutBS failed at MediaSDK/samples/sample_multi_transcode/src/pipeline_transcode.cpp:1912 [ERROR], sts=MFX_ERR_ABORTED(-12), Run, CTranscodingPipeline::Run::Transcode() [0x55b3375ef9a0] failed at MediaSDK/samples/sample_multi_transcode/src/pipeline_transcode.cpp:4584
@tongshi203, the error you see is from MSDK software stack, "GPU hang" usually is reported by KMD, you could try cmd "dmesg" to see whether this kind of error occurs. In My unit test, "lowerpower:on" works, so this should not be a limitation. Could you try the kernel and firmware I mentioned? Could you ping me(dan.ai@intel.com) or leave an email so that we can talk on windows team.
the GPU hang issue(Tongshi203 reported) is a known issue that has been root caused and fixed internally. The fix will be contained in an IFWI update which is under the release flow and may take some time;
meanwhile, the la_ext mode has some limitations on the input Bitstream. I have tried the stream shared by changliwu and this stream is fine and the case can pass with a GPU hang WR. the other two cases(CQP and fixed BS size case) changliwu tried can pass with the GPU hang WR too. If "MFX_ERR_UNSUPPORTED" is reproduced which normally means the test stream contains some unsupported feature.
the GPU hang issue has been fixed with ww33 IFWI and the case can pass now.
Hi @aidan2020sh , can you please say if it was a public IFWI update or internal one?
@dmitryermilov it was an internal version. I have no idea where the customer get the IFWI, i'd like to suggest @tongshi203 to ask the official intel contact to get the latest official release and have a try. Usually we developers do not handle this kind of affair.
That should be a production release. But how it is released to external users is unknown to us as of now ...
I was told the IFWI update usually is available for all external users from Download Center of the intel homepage which takes time to refresh, or users with NDA can directly get it from internal sharepoint.
Hi, I run sample_multi_transcode like this: -i::h265 1.h265 -o::sink -hw -la_ext -join -i::source -o::h265 new.h265 -hw -la_ext -join -lowpower:on
but the error is: [ERROR], sts=MFX_ERR_DEVICE_FAILED(-17), PutBS, Encode: SyncOperation failed at MediaSDK/samples/sample_multi_transcode/src/pipeline_transcode.cpp:1950 [ERROR], sts=MFX_ERR_ABORTED(-12), Encode, PutBS failed at MediaSDK/samples/sample_multi_transcode/src/pipeline_transcode.cpp:1444 [ERROR], sts=MFX_ERR_ABORTED(-12), Run, CTranscodingPipeline::Run::Encode() [0x55c81411ef60] failed at MediaSDK/samples/sample_multi_transcode/src/pipeline_transcode.cpp:4596
My development environment are: OS: Centos8.2 CPU:Rocket Lake 11th Gen Intel(R) Core(TM) i5-11500 @ 2.70GHz Base Board: H510M-ITX/ac
thanks!