ProjectPhysX / OpenCL-Benchmark

A small OpenCL benchmark program to measure peak GPU/CPU performance.
Other
161 stars 19 forks source link

suspicious results for AMD Carrizo - Linux vs Windows #12

Closed fanoush closed 5 months ago

fanoush commented 5 months ago

Hello, here are results from Windows 10 and Ubuntu Mate 22.04 for HP T630 thin client with integrated Carrizo GPU

windows 10
|----------------.------------------------------------------------------------|
| Device ID    0 | Carrizo                                                    |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Carrizo                                                    |
| Device Vendor  | Advanced Micro Devices, Inc.                               |
| Device Driver  | 3240.6 (Windows)                                           |
| OpenCL Version | OpenCL C 2.0                                               |
| Compute Units  | 6 at 626 MHz (384 cores, 0.481 TFLOPs/s)                   |
| Memory, Cache  | 6577 MB, 16 KB global / 32 KB local                        |
| Buffer Limits  | 4720 MB global, 4833984 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| FP64  compute                                         0.137 TFLOPs/s (1/4 ) |
| FP32  compute                                         0.224 TFLOPs/s (1/2 ) |
| FP16  compute                                         0.278 TFLOPs/s (1/2 ) |
| INT64 compute                                         0.067  TIOPs/s (1/8 ) |
| INT32 compute                                         0.090  TIOPs/s (1/4 ) |
| INT16 compute                                         0.179  TIOPs/s (1/3 ) |
| INT8  compute                                         0.179  TIOPs/s (1/3 ) |
| Memory Bandwidth ( coalesced read      )                         17.60 GB/s |
| Memory Bandwidth ( coalesced      write)                         11.90 GB/s |
| Memory Bandwidth (misaligned read      )                         14.24 GB/s |
| Memory Bandwidth (misaligned      write)                         10.12 GB/s |
| PCIe   Bandwidth (send                 )                          4.85 GB/s |
| PCIe   Bandwidth (   receive           )                          4.95 GB/s |
| PCIe   Bandwidth (        bidirectional)            (Gen3 x16)    4.96 GB/s |
|-----------------------------------------------------------------------------|

Linux

|----------------.------------------------------------------------------------|
| Device ID    0 | CARRIZO (carrizo, LLVM 15.0.7, DRM 3.54, 6.5.0-28-generic) |
| Device ID    1 | pthread-AMD Embedded G-Series GX-420GI Radeon R7E          |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | CARRIZO (carrizo, LLVM 15.0.7, DRM 3.54, 6.5.0-28-generic) |
| Device Vendor  | AMD                                                        |
| Device Driver  | 23.2.1-1ubuntu3.1~22.04.2 (Linux)                          |
| OpenCL Version | OpenCL C 1.1                                               |
| Compute Units  | 6 at 626 MHz (384 cores, 0.481 TFLOPs/s)                   |
| Memory, Cache  | 7469 MB, 0 KB global / 64 KB local                         |
| Buffer Limits  | 1867 MB global, 65536 KB constant                          |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
| FP64  compute                                        21.380 TFLOPs/s ( 32x) |
| FP32  compute                                        71.363 TFLOPs/s ( 64x) |
| FP16  compute                                          not supported        |
| INT64 compute                                         1.553  TIOPs/s ( 4x ) |
| INT32 compute                                        76.344  TIOPs/s ( 64x) |
| INT16 compute                                        39.724  TIOPs/s ( 64x) |
| INT8  compute                                        40.381  TIOPs/s ( 64x) |
| Memory Bandwidth ( coalesced read      )                       3521.36 GB/s |
| Memory Bandwidth ( coalesced      write)                       2365.60 GB/s |
| Memory Bandwidth (misaligned read      )                       3950.44 GB/s |
| Memory Bandwidth (misaligned      write)                       2344.13 GB/s |
| PCIe   Bandwidth (send                 )                          3.80 GB/s |
| PCIe   Bandwidth (   receive           )                          1.60 GB/s |
| PCIe   Bandwidth (        bidirectional)            (Gen2 x16)    2.23 GB/s |
|-----------------------------------------------------------------------------|

I am not sure what numbers to expect but aren't those Ubuntu results too optimistic (check also the Memory Bandwidth)? This is same device dualbooting between windows and linux. T630 is AMD GX-420GI quad-core APU from year 2016 https://en.wikipedia.org/wiki/List_of_AMD_processors_with_3D_graphics#I-Family:_%22Brown_Falcon%22_(2016,_SoC)

I used precompiled linux and windows binaries from https://github.com/ProjectPhysX/OpenCL-Benchmark/releases/ (I guess it was the latest 1.3 but maybe previous 1.2 one, not actually sure EDIT: tried both versions, no difference)

ProjectPhysX commented 5 months ago

Hi @fanoush,

thanks for reporting this! For Linux the kernels are not properly executed on Carrizo for some reason, runtime is close to zero, and wrong results are reported. Looks like AMD's old Linux driver is broken and neither compiles correctly nor throws an error. The benchmark uses a 1GB buffer allocation which is within the reported limit; not sure why AMD reports different buffer allocation limits on Windows/Linux. There is nothing I can do about broken legacy drivers except maybe find a workaround. But the benchmark kernels are super simple and I doubt there even is a workadound. AMD won't fix their legacy drivers ever.

Kind regards, Moritz

fanoush commented 5 months ago

Thanks for the reply. I am not sure what "AMD's old linux driver" is, for Carrizo the Ubuntu 22.04 loads the opensource amdgpu kernel driver, older generation (Kabini,Kaveri) loads the radeon driver as per table here https://www.x.org/wiki/RadeonFeature/#featurematrixforfreeradeondrivers (Carrizo is Volcanic islands).

So I was thinking everything is opensource (amdgpu+mesa) except maybe some AMD firmware. But I am not sure where the "OpenCL C 1.1" comes from.

fanoush commented 5 months ago

There is nothing I can do about broken legacy drivers except maybe find a workaround. But the benchmark kernels are super simple and I doubt there even is a workadound.

Maybe the benchmark could test the result of the computation done in OpenCL kernel? Then if the result is not as expected the kernel does not work. That would not only test speed but also accuracy/correctness of OpenCL implementation.