ROCm / rocDecode

rocDecode is a high performance video decode SDK for AMD hardware
https://rocm.docs.amd.com/projects/rocDecode/en/latest
Other
10 stars 13 forks source link

[Issue]: Conformance #249

Open kiritigowda opened 6 months ago

kiritigowda commented 6 months ago

Problem Description

Conformance on CI fails for 2 streams (GFX942) -- http://math-ci.amd.com/blue/organizations/jenkins/gfx942%2Fprecheckin%2FrocDecode/detail/develop/17/pipeline/261/

Conformance test completed on the 135 streams:
     - The number of passing streams is 133
     - The number of failing streams is 2
     - The number of streams that did not finish decoding is 0

Need this to pass to throw hard failures on errors on CI

Operating System

ALL

CPU

ALL

GPU

AMD Instinct MI300A

ROCm Version

ROCm 6.1.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

jeffqjiangNew commented 6 months ago

The driver need to have VCN FW version 6 and up to pass the last two streams. What is the VCN FW version? I see ROCm 6.0.0 on this page, which may not have the new driver build? BTW, can't seem to see the conformance test result in the link http://math-ci.amd.com/blue/organizations/jenkins/gfx942%2Fprecheckin%2FrocDecode/detail/develop/17/pipeline/261/.

kiritigowda commented 6 months ago

The driver need to have VCN FW version 6 and up to pass the last two streams. What is the VCN FW version? I see ROCm 6.0.0 on this page, which may not have the new driver build? BTW, can't seem to see the conformance test result in the link http://math-ci.amd.com/blue/organizations/jenkins/gfx942%2Fprecheckin%2FrocDecode/detail/develop/17/pipeline/261/.

We are using - compute-artifactory.amd.com:5000/rocm-plus-docker/compute-rocm-dkms-no-npi-hipclang:13402-ubuntu-22.04-stg1 docker to test. Do you know which build we require to pass this test. To view the result you need to be on VPN. The results are on test phase.

jeffqjiangNew commented 6 months ago

Driver team said build 1726526 has the fix for MI300. I'm in office but just realised that I need to click the complete log button to see the conformance test result.

kiritigowda commented 6 months ago

Link to log - http://math-ci.amd.com/blue/rest/organizations/jenkins/pipelines/gfx942/pipelines/precheckin/pipelines/rocDecode/branches/develop/runs/17/nodes/261/steps/264/log/?start=0

jeffqjiangNew commented 6 months ago

Note that MIxxx VCN FW is updated manually by VAAPI driver team, unlike Radeon GPU VCN FW which is handled automatically. Now, only MI300 has the FW fix. MI200/250/100 don't have FW fix yet. It is expected to see the 1/2 conformance streams failing on MI2xx/100.

kiritigowda commented 6 months ago

Note that MIxxx VCN FW is updated manually by VAAPI driver team, unlike Radeon GPU VCN FW which is handled automatically. Now, only MI300 has the FW fix. MI200/250/100 don't have FW fix yet. It is expected to see the 1/2 conformance streams failing on MI2xx/100.

@jeffqjiangNew the current failure of 2 streams is on GFX942/MI300A - http://math-ci.amd.com/blue/rest/organizations/jenkins/pipelines/gfx942/pipelines/precheckin/pipelines/rocDecode/branches/develop/runs/20/nodes/261/steps/264/log/?start=0

info: Input file: WPP_F_ericsson_MAIN_2.bit
info: Using GPU device 0 - AMD Instinct MI300A[gfx942:sramecc-:xnack-] on PCI bus 01:00.0
info: decoding started, please wait!
Input Video Information
    Codec        : H.265/HEVC
    Sequence     : Progressive
    Coded size   : [192, 240]
    Display area : [0, 0, 192, 240]
    Chroma       : YUV 420
    Bit depth    : 8
Video Decoding Params:
    Num Surfaces : 7
    Crop         : [0, 0, 0, 0]
    Resize       : 192x240

info: Total frame decoded: 48
info: avg decoding time per frame: 1.81653 ms
info: avg FPS: 550.499
MD5 message digest: 2aaf16274fe8e799d72fa08a4963850d
MD5 digest matches the reference MD5 digest: 2aaf16274fe8e799d72fa08a4963850d
Conformance test completed on the 135 streams:
     - The number of passing streams is 134
     - The number of failing streams is 1
     - The number of streams that did not finish decoding is 0
jeffqjiangNew commented 6 months ago

@kiritigowda Do we have driver build number info in the log? The first build which has the fix is 1726526 as per VAAP team.

kiritigowda commented 5 months ago

CI Failure

info: Input file: RPLM_B_qualcomm_4.bit
info: Using GPU device 0 - AMD Instinct MI300X[gfx942:sramecc+:xnack-] on PCI bus df:00.0
info: decoding started, please wait!
Input Video Information
    Codec        : H.265/HEVC
    Sequence     : Progressive
    Coded size   : [416, 240]
    Display area : [0, 0, 416, 240]
    Chroma       : YUV 420
    Bit depth    : 8
Video Decoding Params:
    Num Surfaces : 7
    Crop         : [0, 0, 416, 240]
    Resize       : 416x240

info: Total frame decoded: 300
info: avg decoding time per frame: 0.912035 ms
info: avg FPS: 1096.45
MD5 message digest: d304e72d4863c6ed7d0e02a61f2da90f
MD5 digest does not match the reference MD5 digest: 653ac7c46fa7b9d7d966e3db317eb938

Conformance test completed on the 135 streams:
     - The number of passing streams is 134
     - The number of failing streams is 1
     - The number of streams that did not finish decoding is 0
kiritigowda commented 5 months ago

AMD Radeon PRO W6800[gfx1030]

info: Using GPU device 0 - AMD Radeon PRO W6800[gfx1030] on PCI bus 0b:00.0
Conformance test completed on the 135 streams:
     - The number of passing streams is 128
     - The number of failing streams is 2
     - The number of streams that did not finish decoding is 5
kiriti@simon:~/develop/rocdecode-kiriti/conformance$ dpkg -l | grep rocm
ii  rocm-core                                  6.1.0.60100-48~20.04                  amd64        Radeon Open Compute (ROCm) Runtime software stack
kiritigowda commented 5 months ago

@LakshmiKumar23 please add the driver/firmware version to close this issue.

jeffqjiangNew commented 4 months ago

On MI300/Navi31/VCN4, the VCN FW version with the fix is DEC 6 and up. On Navi2x/VCN3, the VCN FW version with the fix is DEC 3 and up. MI2xx/VCN2.6 FW is still being worked on.

GPU driver version with the fix is 1709904.

kiritigowda commented 4 months ago

@eidenyoshida -- Due to base machine driver versions some of our tests are failing on the CI. Could we please upgrade to the above mentioned drivers to test on the CI? Thanks!

kiritigowda commented 3 months ago

latest on the CI

Conformance test completed on the 135 streams:
     - The number of passing streams is 134
     - The number of failing streams is 1
     - The number of streams that did not finish decoding is 0
kiritigowda commented 1 month ago

Current TOT test results - http://math-ci.amd.com/blue/organizations/jenkins/mainline%2Fprecheckin%2FrocDecode/detail/develop/97/pipeline/539

Conformance test completed on the 135 streams:
     - The number of passing streams is 133
     - The number of failing streams is 2
     - The number of streams that did not finish decoding is 0
jeffqjiangNew commented 3 weeks ago

HEVC conformance test on MI250 node x1000c4s5b1n0, with ROCm 6.3 2008307 passed:

Conformance test completed on the 135 streams: