Closed guoyejun closed 4 years ago
Hi @guoyejun ,
Well, it's really pretty weird. It seems you already covered most of the options. Let's try one more. Please compare MSDK API level logs in case of samples and ffmpeg. For it, please use the MediaSDK tracer. Unfortunately it's not opensourced yet but you can use the tracer from MediaServerStudio MediaSDK release. Attached the latest one. tracer.zip
thanks @dmitryermilov, could you let me know where to download "Intel®Media Server Studio 2018 R2 for Linux Servers" mentioned in the pdf file of tracer.zip, i searched it and always finally got https://github.com/Intel-Media-SDK/MediaSDK/releases, i think this is not the correct one for "MediaServerStudio MediaSDK release".
i did try the trace tool with my current opensource msdk, and it crashes with: [ERROR], sts=MFX_ERR_UNSUPPORTED(-3), Init, m_mfxSession.InitEx failed at /work/media/MediaSDK/samples/sample_encode/src/pipeline_encode.cpp ...
btw, forgot to mention yesterday, for ffmpeg + my code case, there is no gpu hang if I just comment one line ("configBuffers.push_back(m_roiBufferId);") in msdk code:
in file /_studio/mfx_lib/shared/src/mfx_h264_encode_vaapi.cpp, function MfxHwH264Encode::VAAPIEncoder::Execute, near line 2860.
if (task.m_numRoi)
{
MFX_CHECK_WITH_ASSERT(MFX_ERR_NONE == SetROI(task, m_arrayVAEncROI, m_vaDisplay, m_vaContextEncode, m_roiBufferId), MFX_ERR_DEVICE_FAILED);
// no GPU hang if the following line is commented.
configBuffers.push_back(m_roiBufferId);
}
You're welcome.
where to download "Intel®Media Server Studio 2018 R2 for Linux Servers"
It was available publicly but now it's removed from Intel sites. You don't need it:)
I just double-checked the package I had shared. It works on my side.
I assume you followed incorrect (out-of-date) installation steps from readme-tracer.pdf. Please follow these steps:
sudo mv /opt/intel/mediasdk/lib64/libmfxhw64.so.1.30 /opt/intel/mediasdk/lib64/libmfxhw64.so.1.30-real
sudo ln -s -f /path/to/libmfx-tracer.so /opt/intel/mediasdk/lib64/libmfxhw64.so.1.30
./mfx-tracer-config --default
./mfx-tracer-config core.type file
./mfx-tracer-config core.log ~/mfxtracer.log
./mfx-tracer-config core.lib /opt/intel/mediasdk/lib64/libmfxhw64.so.1.30-real
or instead of file dumping you can output to console by
./mfx-tracer-config core.type file
Please don't forget to change "/opt/intel/mediasdk" to the right path if it's different at your side.
Eventually, when you execute ffmpeg or some msdk sample you'll see a file like ~/mfxtracer_6988.log
yes, it works with your steps.
I did a quick log compare and has not found the point yet, could you find any clue? thanks. See msdk logs of ffmpeg+my code, and msdk sample + my code. ffmpeg.mfxtracer_6372.log msdksample.mfxtracer_6030.log
I also enabled LIBVA_TRACE to get the logs for your reference. ffmpeg.103640.thd-0x000018e9.txt ffmpeg.103640.thd-0x000018e4.txt
msdksample.102449.thd-0x0000178f.txt msdksample.102449.thd-0x0000178e.txt
btw, libva_trace of ffmpeg+my code shows num_buffers is 18, while msdk sample +my code shows num_buffers is 20. To make the compare a bit easier, I'v made a change in ffmpeg to make the num_buffers 20, see https://github.com/guoyejun/ffmpeg/commit/2b69dfa5d21bd5ec3915e7e1ebd1ca385dcb078e
It seems mfxtracer logs are not full. E.g. I don't see EncodeFrameAsync calls there.
Please do and recollect logs:
./mfx-tracer-config core.level full
thanks for the info, please see attached logs. 1211ffmpeg.mfxtracer_27817.log 1211msdksample.mfxtracer_15189.log
Thank you @guoyejun . You know, I looked at logs and honestly didn't see something obvious which could cause a GPU hang. Although there are lots of difference I can't find something really suspicious. Can you please try to align step by step MFXVideoENCODE_Init parameters in ffmpeg qsv code with sample_encode and check when the issue goes away (at some point it should!) ?
thanks, I found the reason is that multiple frame mode is enabled as default at ffmpeg side.
btw, how to query the max supported roi numbers? thanks.
the document at https://github.com/Intel-Media-SDK/MediaSDK/blob/master/doc/mediasdk-man.md#mfxVideoParam says: Number of ROI descriptions in array. The Query function mode 2 returns maximum supported value (set it to 256 and Query will update it to maximum supported value).
i do not understand what 'mode 2' means, any sample code? thanks.
i did a try and debug into MFXVideoENCODE_Query, but do not find where the value is queried. The most possible part is: mfxRes = handler == codecId2Handlers.end() ? MFX_ERR_UNSUPPORTED : (handler->second.primary.query)(session, in, out);
but i'm unable to step into it within gdb. (i have built msdk with debug mode)
thanks, I found the reason is that multiple frame mode is enabled as default at ffmpeg side.
Great! We need to check if driver returns valid caps for MFE.
i do not understand what 'mode 2' means, any sample code? thanks.
It's here: MFXVideoENCODE_Query
This function works in either of four modes: ... If the in parameter is non-zero, the function checks the validity of the fields in the input structure. Then the function returns the corrected values in the output structure. If there is insufficient information to determine the validity or correction is impossible, the function zeroes the fields. This feature can verify whether the SDK implementation supports certain profiles, levels or bitrates.
From code perspective: https://github.com/Intel-Media-SDK/MediaSDK/blob/1f8456f10bdb204b0ea3067df7f5a3e8a1de407f/_studio/mfx_lib/encode_hw/h264/src/mfx_h264_encode_hw.cpp#L367 https://github.com/Intel-Media-SDK/MediaSDK/blob/9fd26ab972e9f7a2bd74e242827a26bc78330872/_studio/mfx_lib/shared/src/mfx_h264_enc_common_hw.cpp#L2305 https://github.com/Intel-Media-SDK/MediaSDK/blob/9fd26ab972e9f7a2bd74e242827a26bc78330872/_studio/mfx_lib/shared/src/mfx_h264_enc_common_hw.cpp#L4365
thanks, I found the reason is that multiple frame mode is enabled as default at ffmpeg side.
Great! We need to check if driver returns valid caps for MFE.
yes, ffmpeg only enables it as default when MFE is really supported. We need to disable it if want to support ROI encoding.
i do not understand what 'mode 2' means, any sample code? thanks.
It's here: MFXVideoENCODE_Query
This function works in either of four modes: ... If the in parameter is non-zero, the function checks the validity of the fields in the input structure. Then the function returns the corrected values in the output structure. If there is insufficient information to determine the validity or correction is impossible, the function zeroes the fields. This feature can verify whether the SDK implementation supports certain profiles, levels or bitrates.
thanks, i'll continue the try. btw, what does 'mode 2' mean?
btw, what does 'mode 2' mean?
Just enumerator, 1, 2, 3..
i found an issue to query the max supported roi number, and created a PR for the fix, see https://github.com/Intel-Media-SDK/MediaSDK/pull/1856.
hi
just want to confirm that just h264 and h265 hw encoders support roi encoding? And mjpeg, vp9 and mpeg2 encoders do not support roi encoding? thanks.
another question, do you have any plan when to fix the gpu hang issue when mfe and roi are both enabled? thanks.
Hi @guoyejun
just want to confirm that just h264 and h265 hw encoders support roi encoding? And mjpeg, vp9 and mpeg2 encoders do not support roi encoding? thanks.
It's right.
another question, do you have any plan when to fix the gpu hang issue when mfe and roi are both enabled? thanks.
@DenWolf , I assume it's a question to you:)
Hi,
I'm trying to enable ROI encoding for ffmpeg->msdk(qsv) path, but met GPU hang issue, hope to get help, thanks.
my system is skylake + ubuntu 16.04 my msdk version: e6caad2d1e00380f8cab045de62567a0a4a53a53 my iHD version: 644fc6d2bb9e6271a2b91578b1bc2f63275d184a
Firstly, I tried with msdk's sample and verified that it works. ./sample_multi_transcode -i::h265 str352x288.h265 -hw -n 1 -o::h264 ./str.roi.h264 -roi_file roi
Then, i add roi encoding code based on msdk's sample (sample_encode), and it works too. See command blow (out352x288.yuv just contain 1 frame). $ ./sample_encode h264 -i out352x288.yuv -hw -o out.h264 -w 352 -h 288 I also uploaded my change at https://github.com/guoyejun/MediaSDK/commit/20644f25e5204ae9b9760219bf12f101e20ac52d in case you are interested.
Then, I did a similar change in ffmpeg to enable roi encoding. See my change at https://github.com/guoyejun/ffmpeg/commit/4edb1516096e9b2880144d55afde2e1b2ad36f1d, (they are hard code for easy debug).
the h265 encoder works correctly with command: ./ffmpeg -s 352x288 -i out352x288.yuv -c:v hevc_qsv -y qsv.roi.h265
but, there is GPU hang with h264 encoder with command: ./ffmpeg -s 352x288 -i out352x288.yuv -c:v h264_qsv -y qsv.roi.h264
Since msdk's roi encoding API does not tie with encoders, and my code for h265 works, I would say my code is basically right. And there is GPU hang with ffmpeg + my code, while no issue with sample_encode + my code, I would guess there is a limitation (or special requirements) from msdk for the roi encoding, but looks that I did not find hints in msdk's document.
I've tried to enable LIBVA_TRACE for the two cases (ffmpeg + my code VS sample_encode + my code), and no weird found between the logs. I even did some tricky code above libva to make the log of LIBVA_TRACE exactly the same (except the time records), still same issue.
i also attached GPU hang's log from /sys/kernel/debug/dri/0/i915_error_state, see i915_error_state.txt
Are there any dump/parse tools at msdk level or media driver level? Or anything others i can try? thanks.