Closed davidlinbird closed 5 years ago
I would have to ask several questions and provide some general thoughts to help you:
Below are the codes that show how I initialize AMF.
res = g_AMFFactory.Init();
::amf_increase_timer_precision();
res = g_AMFFactory.GetFactory()->CreateContext(&a_Context);
res = a_Context->InitDX11(m_Device); // can be DX11 device
// component: encoder
res = g_AMFFactory.GetFactory()->CreateComponent(a_Context, pCodec, &a_Encoder);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_USAGE, AMF_VIDEO_ENCODER_USAGE_LOW_LATENCY);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_B_PIC_PATTERN, 0);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_QUALITY_PRESET, AMF_VIDEO_ENCODER_QUALITY_PRESET_SPEED);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_TARGET_BITRATE, bitRateIn);//25m
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_FRAMESIZE, ::AMFConstructSize(scrnWidth, scrnHeight));
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_FRAMERATE, ::AMFConstructRate(frameRateIn, 1));60fps
res = a_Encoder->Init(formatIn, scrnWidth, scrnHeight);RGBA,1920x1080
Below are the codes that show how I render and allocate input frames.
D3D11_TEXTURE2D_DESC FrameDesc;
//Note: m_SharedSurf comes from DXGI desktop duplicate component.
m_SharedSurf->GetDesc(&FrameDesc);
amf::AMFSurface* amdInputSurface = nullptr;
auto amdres = a_Context->AllocSurface(amf::AMF_MEMORY_DX11, amf::AMF_SURFACE_BGRA£¬FrameDesc.Width, FrameDesc.Height, &amdInputSurface);
amdInputSurface->SetProperty(START_TIME_PROPERTY, amf_high_precision_clock());
auto amdSurf = reinterpret_cast<id3d11texture2d*></id3d11texture2d*>(amdInputSurface->GetPlane(amf::AMF_PLANE_PACKED)->GetNative());
m_DeviceContext->CopyResource(amdSurf, m_SharedSurf);
amdres = a_Encoder->SubmitInput(amdInputSurface);
There is no others game/app used GPU to interfere color space converter at the same time.
OK:
m_SharedSurf comes from DXGI desktop duplicate component.
OK, here is a potential problem: this texture is filled-in by a copy inside DD API and you start texture copy on the device shared with AMF but you do not wait till the copy is complete. So I suggest to insert after CopyResource(): context->Flush(): and maybe: use D3D11_QUERY_EVENT and loop
to ensure that copy is complete before you call IDXGIOutputDuplication::ReleaseFrame() Another thing to check: How do you create DX11 device? Ensure that you do not use this flag: D3D11_CREATE_DEVICE_SINGLETHREADED
What is m_SharedSurf? It's a pointer to ID3D11Texture2D. It stores the constructed mirror image of the desktop being captured.
We do use shared handles, to share access the surface between multiple threads. But we don't have multiple video cards involved. Neither do we use multiprocessing — just multithread.
We have two threads involved in this program — duplication thread and result thread. I. On duplication thread, we poll "move" and "dirty" changes from IDXGIDesktopDuplication. Then we render those changes (on the same thread) to m_SharedSurf. (This is how the desktop duplication API works — not by returning a picture but by returning only the changes)
II. And then we use a mutex to notify the "result" thread to take back the result. (So if the result thread is blocked for too long then we may miss some frames)
III. On result thread, while holding the mutex, we:
a. First allocate a surface "amdInputSurface" using AMD API.
b. Then call "->GetPlane(amf::AMF_PLANE_PACKED)->GetNative()" to get an ID3D11Texture2D interface to that surface (called amdSurf).
c. Preprocess the received m_SharedSurf
d. use "ID3D11DeviceContext::CopyResource" to copy it to amdSurf.
e. Do "ID3D11DeviceContext::Flush"
f. Call (amd encoder component) -> SubmitInput(amdInputSurface) // !!! we don't know whether this is synchronous or asynchronous
g. Call amdInputSurface->Release()
The only reason why you would use shared handles is if you use two D3D11 device objects running on different threads and you share D3D11 texture between them. Please confirm. Also please explain why do you need two threads. It sounds unusual.
We used the example code from Microsoft on DXGIDesktopDuplication and has no idea why Microsoft chose such a design. Frankly, there were many poorly designed code in their examples
After getting advice from you, I notice that there might be some bugs in my streaming software which cause these problems. Thus, I develop a small AMF program to avoid these bugs. This program use the same code of my streaming program to use the AMF encoder. Also, instead of taking input from DXGL, I make the program always encode the same video which is pre-recorded.
Here are some interesting things I find out. Improvement:
Problems:
I have changed my encoder settings after getting your advice. Here are the settings I use currently.
formatIn=amf::AMF_SURFACE_BGRA
frameRateIn=60
bitRateIn=3500000
scrnWidth=1920
scrnHeight=1080
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_PROFILE, AMF_VIDEO_ENCODER_PROFILE_HIGH);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_PROFILE_LEVEL, 52);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_FULL_RANGE_COLOR, false);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_FRAMERATE, ::AMFConstructRate(frameRateIn, 1));
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_RATE_CONTROL_METHOD, AMF_VIDEO_ENCODER_RATE_CONTROL_METHOD_CBR);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_RATE_CONTROL_PREANALYSIS_ENABLE, AMF_VIDEO_ENCODER_PREENCODE_DISABLED);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_RATE_CONTROL_SKIP_FRAME_ENABLE, false);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_MIN_QP, 18);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_MAX_QP, 51);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_TARGET_BITRATE, bitRateIn);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_PEAK_BITRATE, bitRateIn);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_DE_BLOCKING_FILTER, true);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_FILLER_DATA_ENABLE, false);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_ENFORCE_HRD, false);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_ENABLE_VBAQ, false);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_IDR_PERIOD, 120);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_VBV_BUFFER_SIZE, bitRateIn);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_INITIAL_VBV_BUFFER_FULLNESS, 64);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_MOTION_HALF_PIXEL, true);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_MOTION_QUARTERPIXEL, true);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_MAX_NUM_REFRAMES, 4);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_QUALITY_PRESET, AMF_VIDEO_ENCODER_QUALITY_PRESET_SPEED);
// res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_TARGET_BITRATE, bitRateIn);
res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_FRAMESIZE, ::AMFConstructSize(scrnWidth, scrnHeight));
res = a_Encoder->Init(formatIn, scrnWidth, scrnHeight);
OK, we have some progress. Before we switch to the encoder please note that if you have two D3D11 device objects and use shared handle to share a texture it is application's responsibility to synchronize access to the texture. To make life worse every call to D3D11deviceContext is not executed immediately but marshalled to an internal thread - one per device object. At the same time GPU HW queue is serial. If not synchronized there is no guarantee which device will submit job first, regardless your thread mutex. So the only way to synchronize is to wait on CPU for GPU completion using D3D11 query. Now: encoder: It is hard to believe that RGBA submission is faster then NV12 because RGBA requires color conversion to NV12. Beside parameters it is important to properly measure latency and frame rate. If application submits too fast it can achieve maximum frame rate but latency can be big due internal HW queue. AMF has SimpleEncoder sample that accurately measures both parameters. If you want to share some experiments with me you could just modify the sample by setting the encoder parameters to your needs, check the results and send the modified CPP to me. The sample is encoding at full speed - transcode style. If your goal is to achieve minimal latency you should implement one-in-one-out model. For this you would need an event that signals from the polling thread when a frame is ready and submission thread should wait this signal before submission. This is not highest FPS but lowest latency. Lastly; a short ETL from GPUVIEW will give a lot of timing information.
Here are the ETLs. CaptureState&Kernel.zip NoCaptureState.etl.zip Merged.etl.zip
Does one-in-one-out model significantly lower the FPS? My goal is to reach 1080 p with 40 to 60 frames.
I am only interested in Merged.etl. Yes, you can get 60 FPS for 1080p with one-in-one out. From ET: you run the app at 30 fps but there is plenty of space. Please note that encode tasks can run in parallel to the next GFX task. Encode tasks take just 8-10 ms and rendering + color converter takes about 2.8 ms. Merged.zip Check the JPG in the zip with comments.
I have implemented the encoder in one-in-one-out model. Now the encoding latency is around 13ms including color converter. However, there is a new problem. When we run some GPU benchmark test and AMF encoder at the same time, the encoding latency significantly increase to 20-40ms. I remember you mentioned that color converter may be impacted by other process like games and I think this is the problem. What are your suggestions about this problem besides avoid RGBA submission? Or do you know any methods to capture screen with NV12 format?
OK, few thoughts:
I have tested VCE encoding and DEM encoding with R9380 in 2016 and I remember the encoding latency didn't increase but remained the same while running benchmark test or games simultaneously. Does this issue start happening from VCE 3.0?
DEM was a HW feature not using GFX and behaved differently but it is retired.
Any update?
Thank you Mikhail for all the help and followup on our project. I am very surprise that you still remember me and follow my project. I give up solving the GFX issue since it only happens during benchmark test which is a rare case. I am very interesting to the encoding/decoding performance with new Vega card especially H265 performance. Can you share any information about the encoding performance of Vega card?
Best David
2017-08-29 14:09 GMT-07:00 Mikhail_AMD notifications@github.com:
Any update?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GPUOpen-LibrariesAndSDKs/AMF/issues/92#issuecomment-325804987, or mute the thread https://github.com/notifications/unsubscribe-auth/Abx3SrUucnhOLjyF3IZvbie_9C52Bipoks5sdH4JgaJpZM4NtWt_ .
Vega is about twice as powerful as Polaris in H264 and about 2-2.25x as powerfuly as Polaris in H265/ HEVC. Actual numbers will vary depending on installed HW and usage.
Take a look into white paper on Vega. It has Encoder/Decoder section: http://radeon.com/_downloads/vega-whitepaper-11.6.17.pdf
Closed as stale issue
I built a streaming software using AMF VCE to do the encoding part. Initially, I use Low Latency mode with quality setting. I got perfect screen quality and 16 ms average encode latency.(I am sure about this). However, I tested the program again a week after initial test and the result changes significantly. Now the screen looks flickering especially for small letters and detail images. In addition, the encode latency increases to 25-30 ms. Two tests use the same settings, same program, and same hardware. Right now I cannot reproduce the result I got from initial test. Now, the screen quality of Low Latency mode and Ultra Low Latency mode is barely watchable and I need to use Transcoding mode. I got perfect screen quality with Transcoding mode and about 29-40ms latency.
**My questions
I am using XFX RX460 2GB and the resolution is 1920x1080. Now, I got 22ms with Low Latency mode and 29ms with Transcoding mode. These are the lowest latency I got now.
I only changed a few settings and all the others remains default settings. Here are the settings I changed. res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_USAGE, AMF_VIDEO_ENCODER_USAGE_TRANSCONDING); res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_TARGET_BITRATE, bitRateIn);//25mbps res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_FRAMESIZE, ::AMFConstructSize(scrnWidth, scrnHeight));//1920*1080 res = a_Encoder->SetProperty(AMF_VIDEO_ENCODER_FRAMERATE, ::AMFConstructRate(frameRateIn, 1));//60fps res = a_Encoder->Init(formatIn, scrnWidth, scrnHeight);//AMF_SURFACE_BGRA
Here is the settings header file. VideoEncoderVCE.txt