dartraiden / NVIDIA-patcher

Adds 3D acceleration support for P106-090 / P106-100 / P104-100 / P104-101 / P102-100 / CMP 30HX / CMP 40HX / CMP 50HX mining cards.
194 stars 25 forks source link

170hx(any cmp hx card)can run higher and higher fp32 flops than before #73

Open jetcat8848 opened 6 months ago

jetcat8848 commented 6 months ago

i tried mod a OpenCL benchmark(disable fma to prevent GPU use it,the code like this:

diff --git a/src/lbm.cpp b/src/lbm.cpp index d99202f..28aeb25 100644 --- a/src/lbm.cpp +++ b/src/lbm.cpp @@ -286,6 +286,8 @@ void LBM_Domain::enqueue_unvoxelize_mesh_on_device(const Mesh* mesh, const uchar }

string LBM_Domain::device_defines() const { return

"\n #pragma OPENCL FP_CONTRACT OFF" // prevents implicit FMA optimizations "\n #define fma(a, b, c) ((a) * (b) + (c))" // shadows OpenCL explicit function fma() "\n #define def_Nx "+to_string(Nx)+"u" "\n #define def_Ny "+to_string(Ny)+"u" "\n #define def_Nz "+to_string(Nz)+"u" OK,the moded OpenCL benchmark runs,and the 170hx fp32 flops increased to 6.285 Tflops,the original fp32 flops just only 0.395Tflops,6.285/0.395=16,so,i think the nvidia driver prevented gpu use full speed on FMA! 554794FE-2FCF-4808-9EAD-FE53D4BD9B14 5EA16114-40C5-4585-A650-0BD733AAA351 2420A824-9332-40E0-A9B5-2DB43FC81C0A

jetcat8848 commented 6 months ago

10DE 20C2 the devicr ID is a CMP170HX mining card,i installed a nvidia gird A100-20C driver to run it! ![Uploading D4C351D7-32C8-4685-A388-E3D177234F02.jpeg…]()

jetcat8848 commented 6 months ago

AED06D52-6C40-470A-A830-4ADFDC636786 1E311B2C-4000-45EE-9B1A-A194AA243FA3 0D22035A-0199-4577-A106-C2F51207014D 5BD26C45-06C1-4CD3-AD25-6085601A59B2 86A6707C-BA06-4C43-B05D-A6BD5DCCAA29

bah86 commented 6 months ago

Do you have any idea how to disable fma in the driver?

astronautduckpc commented 6 months ago

Is this the same problem that the cmp70hx and cmp90hx have reduced performance? Described here in open sources

jetcat8848 commented 5 months ago

您知道如何在驱动程序中禁用 fma 吗?

sorry!i have no idea....

jetcat8848 commented 5 months ago

这与 cmp70hx 和 cmp90hx 性能下降的问题相同吗?此处在开源中进行了描述

yes!it is the same!nvidia use efuse to tag fma speed (reduce to: 1/8,1/16,1/32...1/2^n,n=1,2...5),and the nv driver knows how to running!

astronautduckpc commented 5 months ago

这与 cmp70hx и cmp90hx.

да! это то же самое! NVIDIA использует efuse для обозначения скорости fma (уменьшите до: 1/8,1/16,1/32...1/2^n,n=1,2...5), и драйвер nv знает, как работать!

and how to fix or work around this?)

Skylord4321 commented 4 months ago

this is incredible information! so they used Efuse within the driver to hinder the mining card performance!