GPUOpen-LibrariesAndSDKs / AMF

The Advanced Media Framework (AMF) SDK provides developers with optimal access to AMD devices for multimedia processing
Other
602 stars 151 forks source link

Memory leak when i use vce to encode #270

Closed jieytan closed 2 years ago

jieytan commented 3 years ago

Hello

I want to apply amf encoder on video conferencing scenario,

This is my device info: CPU - AMD Ryzen 7 4800H with Radeon Graphics Radeon SW version - 20.50.02.11 Adrenalin GPU - AMD Radeon(TM) Graphics

Here is the workflow: //init 1.m_pFactory->CreateContext(&pContext); 2.I set memory type to be HOST, So i did not call context init; 3.m_pFactory->CreateComponent(pContext, AMFVideoEncoderVCE_SVC, &m_pAmfEncoder); //surface alloc 4.pContext->CreateSurfaceFromHostNative(format, width, height, hStride, height, pIn, &pSurface, pObserver); //enc 5.m_pAmfEncoder->SubmitInput(pSurface); 6.m_pAmfEncoder->QueryOutput(ppData);

During above workflow, i notice the memory is increasing frame by frame.

To fix it, i've tried to use pObserver.OnSurfaceDataRelease() to release pSurface, that is to change the workflow like: 5.pSurface->addObserver(pOb) // release pSurface when callback 6.m_pAmfEncoder->SubmitInput(pSurface); 7.m_pAmfEncoder->QueryOutput(ppData);

However, this callback would return way too early that encoder fails.

So here is my question: 1. I use Host memory, Is above workflow right? And will it worse performance compared to DX9 or DX11? 2. How to manage the memory to avoid memory leak?

Would you please help? Thank you very much.

jieytan commented 3 years ago

Another question:

Code below from sample code CapabilityManager project is weird:

amf_uint32 maxTemporalLayers = 0;
encoderCaps->GetProperty(AMF_VIDEO_ENCODER_CAP_MAX_TEMPORAL_LAYERS, &maxTemporalLayers);
std::wcout << L"\t\tNumber of temporal Layers:" << maxTemporalLayers << L"\n";

when i set maxTemporalLayers = 0, after getproperty, the maxTemporalLayers==0; when i set maxTemporalLayers = 20, after getproperty, the maxTemporalLayers==20;

It seems like Getproperty is not working ?

rhutsAMD commented 3 years ago

Regarding the possible leak, there are a few different possibilities that could be happening.

The callback will be called when the reference count to the surface is 0, at which point you can reuse/delete your own memory.

Submitting Host memory would still be converted to GPU memory before your surfaces are encoded anyways.

Regarding your second comment, you are mixing two property storages: AMFCaps and AMFComponent. AMFCaps is read-only and AMFComponent is read-write. AMFCaps is a maximum value, AMFComponent is the current value.

jieytan commented 3 years ago

Thank you for mention smart pointer! I guess the conversion below may cause the reference number inside smart pointer go wrong?

amf::AMFDataPtr data;
m_pAmfBitstream = (amf::AMFBufferPtr)data.Detch();

Anyways now I now fix the memory leak.

On second question, I am trying to use AMFCaps to get the maximum value. However, i cannot get it. It keeps the initial value.

jieytan commented 3 years ago

Another question:

    int roiSurf_w = (m_amfEncodeParam.w + 15) / 16;
    int roiSurf_h = (m_amfEncodeParam.h + 15) / 16;
    amf::AMFContext1Ptr spContext1(m_pAmfContext);
    for (int i = 0; i < iRoiNum; i++) {
        int roi_x = roiInfo->roi_rect.origin.x * dst_width / src_width /16;
        int roi_y = roiInfo->roi_rect.origin.y * dst_width / src_width /16;
        int roi_w = (roiInfo->roi_rect.size.width * dst_width / src_width + 15) / 16;
        int roi_h = (roiInfo->roi_rect.size.height * dst_width / src_width + 15) / 16;
        AMF_RESULT res = spContext1->AllocSurfaceEx(amf::AMF_MEMORY_HOST, amf::AMF_SURFACE_GRAY32, roiSurf_w, roiSurf_h,
            amf::AMF_SURFACE_USAGE_DEFAULT | amf::AMF_SURFACE_USAGE_LINEAR, amf::AMF_MEMORY_CPU_DEFAULT, &m_pAmfROISurface);
        if (res != AMF_OK)
        {
            printf("AMFContext::AllocSurface(amf::AMF_MEMORY_HOST) for ROI map failed!\n");

        }

        amf_uint32* buf = (amf_uint32*)m_pAmfROISurface->GetPlaneAt(0)->GetNative();
        amf_int32 pitch = m_pAmfROISurface->GetPlaneAt(0)->GetHPitch();
        memset((void*)buf, 0, pitch * roiSurf_h);

        for (int y = roi_y ; y < roi_y + roi_h ; y++)//?
        {
            for (int x = roi_x ; x < roi_x + roi_w ; x++)
            {
                buf[x + y * pitch / 4] = 5;
            }
        }
    }

Is this right way to set ROI? Why i cannot see any difference after set? And how does the number 5 take effects? Why need to /16?
look forward to your reply!

rhutsAMD commented 3 years ago

Regarding your first question, if data is not empty then that code would cause a leak. You may instead use the following code to avoid this leak. m_pAmfBitstream = amf::AMFBufferPtr(data);

For your second question, the capability query with AMF_VIDEO_ENCODER_CAP_MAX_TEMPORAL_LAYERS will be fixed in a future release.

Regarding your final questions, the number 5 is the QP map value that will be passed to the encoder for each MB. The minimum and maximum QP values are [0, 51]. Before submitting to the encoder, do not forget to provide your m_pAmfROISurface to the encoder by setting the AMF_VIDEO_ENCODER_ROI_DATA property on the surface you are passing to the encoder. The division by 16 is done because that is the Macroblock size for H264 and QP is provided at the Macroblock level.

jieytan commented 2 years ago

Thank you. With your help, now i only got two final questions about D3D9.

I notice :

  1. When i init context with D3D9 in sample code TranscodeHW. The converter does not support AMF_SURFACE_YUV420P as input.

  2. When i init context with D3D9 in sample code SampleROI, the ROI setting is not working. (In encoded output h264, the roi area's QP keeps the same as non-roi area)

Are these by design? Otherwise, how should i set to use converter and ROI with D3D9 correctly?

jieytan commented 2 years ago

And with D3D11, the roi is working. However, if i create context with null device, which is like: pContext->InitDX11(NULL); Then, when i need to copy data from cpu to gpu texture, the call to ID3D11DeviceContext::Map() will fail and return E_INVALIDARG. Then i will need to copy data to Host surface and then covert host surface to DX11 tecture, which might affect the performance. Any solution?

jieytan commented 2 years ago

Hi I have run into another bug ( with arfrt32.dll version 1.4.22.0 ).

I use webcam mode and set temporal layer as 2. Now i want to set TL1 as 70% & TL2 as 30% of total bitrate. As document suggested, here is my code:

#define AMF_VIDEO_ENCODER_TARGET_BITRATE_TL1    L"TL1.QL0.TargetBitrate"            // amf_int64; default = depends on USAGE; Target bit rate in bits for SVC TL=1
#define AMF_VIDEO_ENCODER_TARGET_BITRATE_TL0    L"TL0.QL0.TargetBitrate"            // amf_int64; default = depends on USAGE; Target bit rate in bits for SVC TL=0

lRet = m_pAmfEncoder->SetProperty(AMF_VIDEO_ENCODER_TARGET_BITRATE_TL0, m_amfEncodeParam.totalBitrate * 7 / 10);
lRet = m_pAmfEncoder->SetProperty( AMF_VIDEO_ENCODER_TARGET_BITRATE_TL1, m_amfEncodeParam.totalBitrate * 3 / 10);

But as a result, the output bitrate is always bigger than i want. Not sure if i use a wrong index.

Here are some clues:

  1. After analyzing the output nal, i can see a pattern like this:

this nal tid =0 uLen=2 maxTid=2 this nal tid =0 uLen=5 maxTid=2 this nal tid =0 uLen=24880 maxTid=2 this nal tid =0 uLen=100103 maxTid=2 this nal tid =0 uLen=2 maxTid=2 this nal tid =1 uLen=4 maxTid=2 this nal tid =1 uLen=12505 maxTid=2 this nal tid =1 uLen=112479 maxTid=2 …… The bold nal looks very suspicious, without which the setting would be right. I guess?

  1. The peculiar thing is, with arfrt32.dll version 1.4.19.0, not only this setting can get right total bitrate output, but also when i set:
    lRet = m_pAmfEncoder->SetProperty(AMF_VIDEO_ENCODER_TARGET_BITRATE_TL0, m_amfEncodeParam.totalBitrate );
    lRet = m_pAmfEncoder->SetProperty( AMF_VIDEO_ENCODER_TARGET_BITRATE_TL1, m_amfEncodeParam.totalBitrate );

    the output bitrate is m_amfEncodeParam.totalBitrate, not 2*m_amfEncodeParam.totalBitrate. I don't understand…

How should i set this? Please kindly help.

MikhailAMD commented 2 years ago
jieytan commented 2 years ago
AMF_VIDEO_ENCODER_USAGE = AMF_VIDEO_ENCODER_USAGE_WEBCAM;
AMF_VIDEO_ENCODER_PROFILE = AMF_VIDEO_ENCODER_PROFILE_BASELINE;
AMF_VIDEO_ENCODER_IDR_PERIOD = 0;
AMF_VIDEO_ENCODER_NUM_TEMPORAL_ENHANCMENT_LAYERS = 2;
AMF_VIDEO_ENCODER_RATE_CONTROL_METHOD = AMF_VIDEO_ENCODER_RATE_CONTROL_METHOD_CBR;

#define AMF_VIDEO_ENCODER_TARGET_BITRATE_TL1    L"TL1.QL0.TargetBitrate"
#define AMF_VIDEO_ENCODER_TARGET_BITRATE_TL0    L"TL0.QL0.TargetBitrate"
AMF_VIDEO_ENCODER_TARGET_BITRATE_TL0 = 2.5M * 7 / 10;  
AMF_VIDEO_ENCODER_TARGET_BITRATE_TL1 = 2.5M * 3 / 10;

AMF_VIDEO_ENCODER_MIN_QP = 22;
AMF_VIDEO_ENCODER_MAX_QP = 38;
// didnot set max bitrate, 
width * height * fps = 1280 * 720 * 30

Actually, I have no clue how to set the parameter when i want a targeted bitrate with T1:T2 = 7:3. This setting is a trial run. And the total bitrate seems ok with arfrt32.dll version 1.4.19.0. But with arfrt32.dll version 1.4.22.0, the output is bigger than expected.

MikhailAMD commented 2 years ago
  1. Did you create SVC encoder using AMFVideoEncoderVCE_SVC?
  2. When you split the stream by layers, what is bitrate for each layer?
  3. Note, that highest layer includes smaller layers. If app doesn't set a parameter AMF will assign default. The app can always set whatever it likes.
jieytan commented 2 years ago
  1. Yes, i use "m_pFactory->CreateComponent(pContext, AMFVideoEncoderVCE_SVC, &m_pAmfEncoder);" to create encoder;

  2. I want a total bitrate = 2.5M, and first layer bitrate= 2.5M 7/10, second layer bitrate= 2.5M 3/10;

  3. As you suggested, I use TranscodeHW.exe to encode the same YUV and SVCSplitter.exe to analyse each layer of its output. Still, i use the following setting to encode:

    
    AMF_VIDEO_ENCODER_NUM_TEMPORAL_ENHANCMENT_LAYERS = 2;
    AMF_VIDEO_ENCODER_RATE_CONTROL_METHOD = AMF_VIDEO_ENCODER_RATE_CONTROL_METHOD_CBR;

define AMF_VIDEO_ENCODER_TARGET_BITRATE_TL1 L"TL1.QL0.TargetBitrate"

define AMF_VIDEO_ENCODER_TARGET_BITRATE_TL0 L"TL0.QL0.TargetBitrate"

AMF_VIDEO_ENCODER_TARGET_BITRATE_TL0 = 2.5M 7 / 10;
AMF_VIDEO_ENCODER_TARGET_BITRATE_TL1 = 2.5M
3 / 10;



And here is the output:
arfrt32.dll version 1.4.19.0.
![966_463_1(1)](https://user-images.githubusercontent.com/91239901/138850618-3b324dc3-395a-41f7-8bde-f52b5ecb1295.png)
arfrt32.dll version 1.4.22.0.
![image](https://user-images.githubusercontent.com/91239901/138850465-b7f4d5c8-f56e-4f87-829f-bc94d104ccff.png)

The exe and yuv file is completely the same, but the behavior is so different. Could you please help to check if any bugs exists?
MikhailAMD commented 2 years ago

OK, seems bitrate settings are not applied. I would like to see code snippet with full encoder setup and debug output when you run.

jieytan commented 2 years ago

I just add following settings in this file: image

    PushParamsToPropertyStorage(pParams, ParamEncoderStatic, m_pEncoder);
    res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_PROFILE, 66); // use original name
    LOG_AMF_ERROR(res, L"storage->SetProperty(AMF_VIDEO_ENCODER_PROFILE) failed ");
    res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_IDR_PERIOD, 0); // use original name
    LOG_AMF_ERROR(res, L"storage->SetProperty(AMF_VIDEO_ENCODER_IDR_PERIOD) failed ");
//    m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_FRAMESIZE, ::AMFConstructSize(scaleWidth, scaleHeight));

    PushParamsToPropertyStorage(pParams, ParamEncoderDynamic, m_pEncoder);
    res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_NUM_TEMPORAL_ENHANCMENT_LAYERS, 2); // use original name
    LOG_AMF_ERROR(res, L"storage->SetProperty(AMF_VIDEO_ENCODER_NUM_TEMPORAL_ENHANCMENT_LAYERS) failed ");

#define AMF_VIDEO_ENCODER_TARGET_BITRATE_TL1    L"TL1.QL0.TargetBitrate"            // amf_int64; default = depends on USAGE; Target bit rate in bits for SVC TL=1
#define AMF_VIDEO_ENCODER_TARGET_BITRATE_TL0    L"TL0.QL0.TargetBitrate"            // amf_int64; default = depends on USAGE; Target bit rate in bits for SVC TL=0
    res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_TARGET_BITRATE_TL0, 2500000 * 7 / 10);
    res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_TARGET_BITRATE_TL1, 2500000 * 3 / 10);

    res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_MIN_QP, 22); // use original name
    LOG_AMF_ERROR(res, L"storage->SetProperty(AMF_VIDEO_ENCODER_MIN_QP) failed ");
    res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_MAX_QP, 35); // use original name
    LOG_AMF_ERROR(res, L"storage->SetProperty(AMF_VIDEO_ENCODER_MAX_QP) failed ");
    res = m_pEncoder->Init(amf::AMF_SURFACE_NV12, scaleWidth, scaleHeight);
    CHECK_AMF_ERROR_RETURN(res, L"m_pEncoder->Init() failed");

And use command line: TranscodeHW.exe -input C:\Users\hp\Downloads\AMF\yuv\XXX_1280x720_25.yuv -output C:\Users\hp\Downloads\AMF\yuv\avc\XXX_1280x720_25_AMF_CBR_2500000_out.h264 -width 1280 -height 720 -usage webcam -rateControlMethod cbr -targetBitrate 2500000

To get the output analyzed, i use: SVCSplitter.exe -n 2 C:\Users\hp\Downloads\AMF\yuv\avc\XXX_1280x720_25_AMF_CBR_2500000_out.h264 C:\Users\hp\Downloads\AMF\yuv\avc\XXX_1280x720_25_AMF_CBR_2500000_outS.h264

Please help!

MikhailAMD commented 2 years ago

Hi. I've reproduced the problem. With driver update a new encoder code is used so the difference in behavior. To get around the issue you can move setting of bitrate by layers after Init() call. It should work with older driver as well. In any case, bitrate is dynamic parameter and can be set at any time. I will open internal ticket.

jieytan commented 2 years ago

Unfortunately, i tried

res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_NUM_TEMPORAL_ENHANCMENT_LAYERS, 2); 
#define AMF_VIDEO_ENCODER_TARGET_BITRATE_TL1    L"TL1.QL0.TargetBitrate"  
#define AMF_VIDEO_ENCODER_TARGET_BITRATE_TL0    L"TL0.QL0.TargetBitrate"
res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_TARGET_BITRATE_TL0, 2500000 * 7 / 10);
res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_TARGET_BITRATE_TL1, 2500000 * 3 / 10);

But problem still.
I even tried without setting temporal layer as 2, but the bitrate is still high. Also, I notice the ROI setting is not working as well, which works fine on older drive version. Please also keep an eye on it.

In summary, Bitrate, Temproray layer, ROI, please help to make sure these functions work well. Thank you !

MikhailAMD commented 2 years ago

This is what I mean:

PushParamsToPropertyStorage(pParams, ParamEncoderStatic, m_pEncoder); res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_PROFILE, 66); // use original name LOG_AMF_ERROR(res, L"storage->SetProperty(AMF_VIDEO_ENCODER_PROFILE) failed "); res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_IDR_PERIOD, 0); // use original name LOG_AMF_ERROR(res, L"storage->SetProperty(AMF_VIDEO_ENCODER_IDR_PERIOD) failed "); // m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_FRAMESIZE, ::AMFConstructSize(scaleWidth, scaleHeight));

PushParamsToPropertyStorage(pParams, ParamEncoderDynamic, m_pEncoder);
res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_NUM_TEMPORAL_ENHANCMENT_LAYERS, 2); // use original name
LOG_AMF_ERROR(res, L"storage->SetProperty(AMF_VIDEO_ENCODER_NUM_TEMPORAL_ENHANCMENT_LAYERS) failed ");

define AMF_VIDEO_ENCODER_TARGET_BITRATE_TL1 L"TL1.QL0.TargetBitrate" // amf_int64; default = depends on USAGE; Target bit rate in bits for SVC TL=1

define AMF_VIDEO_ENCODER_TARGET_BITRATE_TL0 L"TL0.QL0.TargetBitrate" // amf_int64; default = depends on USAGE; Target bit rate in bits for SVC TL=0

res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_MIN_QP, 22); // use original name
LOG_AMF_ERROR(res, L"storage->SetProperty(AMF_VIDEO_ENCODER_MIN_QP) failed ");
res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_MAX_QP, 35); // use original name
LOG_AMF_ERROR(res, L"storage->SetProperty(AMF_VIDEO_ENCODER_MAX_QP) failed ");
res = m_pEncoder->Init(amf::AMF_SURFACE_NV12, scaleWidth, scaleHeight);
CHECK_AMF_ERROR_RETURN(res, L"m_pE`ncoder->Init() failed");
res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_TARGET_BITRATE_TL0, 2500000 * 7 / 10);
res = m_pEncoder->SetProperty(AMF_VIDEO_ENCODER_TARGET_BITRATE_TL1, 2500000 * 3 / 10);
jieytan commented 2 years ago

Thank you. Looking forward to the new build.

rhutsAMD commented 2 years ago

The fix for the capability query with AMF_VIDEO_ENCODER_CAP_MAX_TEMPORAL_LAYERS is now publicly released in the Radeon™ Software Adrenalin 21.11.3 driver and I confirmed it is working with this driver. Similarly the fix for the AMF_VIDEO_ENCODER_ROI_DATA is included and has been verified.

Additionally, the data provided through AMF_VIDEO_ENCODER_ROI_DATA to the encoder is actually an importance map of values ( [0, 10] inclusive ) which have an effect on the QP values of the corresponding Macroblocks. It is not a way of explicitly setting QP values. Furthermore, the individual importance map values are relative to the average importance, which is why the sample sets the outer region importance to 0 and sets the inner region importance to 10.

jieytan commented 2 years ago

Indeed, these two functions are both working now! Thank you and your explanation. BTW, do you also know when will the AMF_VIDEO_ENCODER_TARGET_BITRATE_TL fix be ready?

jieytan commented 2 years ago

@MikhailAMD Hi Sorry to bother you. Any updates?

rhutsAMD commented 2 years ago

Could you please try with the latest public release driver? We were not able to reproduce the issue with Radeon™ Software Adrenalin 21.12.1.

jieytan commented 2 years ago

That is strange. I still have this issue with Adrenalin 21.12.1.

jieytan commented 2 years ago

Hi. I've reproduced the problem. With driver update a new encoder code is used so the difference in behavior. To get around the issue you can move setting of bitrate by layers after Init() call. It should work with older driver as well. In any case, bitrate is dynamic parameter and can be set at any time. I will open internal ticket.

@MikhailAMD Sorry to bother you. Can you still reproduce this issue on your side? Or you've managed to fix it?

rhutsAMD commented 2 years ago

There are two issues, the first is regarding setting the temporal enhancement layer bitrate before vs. after encoder->Init(). This has been fixed and is available in the latest public driver.

The second issue is a mix up of the layers. Please swap the bitrates you are currently setting to have the larger bitrate set for “TL1”. Layer #1 “TL1.QL0.” is the full stream and cannot be requested to have a smaller bitrate than the reduced stream. Layer #0 “TL0.QL0.” is the reduced stream and should have the smaller bitrate requested.

We will update the documentation to clarify this differentiation between the layers. Also, the SVCSplitter sample will be updated to add more tracing and generation of all of the layers.

jieytan commented 2 years ago

On the first issue, Yes i got arfrt32.dll version 1.4.23.0. and issue is fixed now!

On the second issue. Thanks for clarifying. So if I what to encode 2 layers. total biterate is 3M, and the ratio is 2:1, How can i set it? “TL1.QL0.” is 2M. “TL0.QL0.” is 1M. Like this? And what about 3 layers?

Please guide me! Thank you very much!

jieytan commented 2 years ago

@rhutsAMD Hi When will the doc be updated?

rhutsAMD commented 2 years ago

For the 3 layer case, continue setting the per layer parameters following the same pattern i.e. with “TL2.QL0.” having the highest bitrate, and down to the base temporal layer “TL0.QL0.” having the lowest bitrate. This can be expanded to as many temporal layers as your setup supports.

The documentation will be updated in the next AMF release. In the meantime, I will share the updated information below:

2.2.4 SVC Properties SVC encoding is enabled by setting the number of temporal enhancement layers to a value that is greater than 1 before initializing the encoder. The maximum number of temporal layers supported by the encoder can be queried from the encoder capabilities before initializing the encoder.

To define SVC parameters per layer, the following format must be used:      TL.QL.

As an example with two temporal layers, to configure “Target bitrate” for the base/first temporal layer and first quality layer, the following parameter should be used:      “TL0.QL0.TargetBitrate”

To configure “Target bitrate” for the second temporal layer and first quality layer, the following parameter should be used:      “TL1.QL0.TargetBitrate”

When setting per layer parameters, the equivalent non-SVC layer parameters should not be set for the encoder otherwise the per layer configuration will be overwritten.

Remark: quality layers are not supported on VCE 1.0. “QL0” must be used for quality layers.