Open jetcat8848 opened 6 months ago
10DE 20C2 the devicr ID is a CMP170HX mining card,i installed a nvidia gird A100-20C driver to run it! ![Uploading D4C351D7-32C8-4685-A388-E3D177234F02.jpeg…]()
Do you have any idea how to disable fma in the driver?
Is this the same problem that the cmp70hx and cmp90hx have reduced performance? Described here in open sources
您知道如何在驱动程序中禁用 fma 吗?
sorry!i have no idea....
这与 cmp70hx 和 cmp90hx 性能下降的问题相同吗?此处在开源中进行了描述
yes!it is the same!nvidia use efuse to tag fma speed (reduce to: 1/8,1/16,1/32...1/2^n,n=1,2...5),and the nv driver knows how to running!
这与 cmp70hx и cmp90hx.
да! это то же самое! NVIDIA использует efuse для обозначения скорости fma (уменьшите до: 1/8,1/16,1/32...1/2^n,n=1,2...5), и драйвер nv знает, как работать!
and how to fix or work around this?)
this is incredible information! so they used Efuse within the driver to hinder the mining card performance!
i tried mod a OpenCL benchmark(disable fma to prevent GPU use it,the code like this:
diff --git a/src/lbm.cpp b/src/lbm.cpp index d99202f..28aeb25 100644 --- a/src/lbm.cpp +++ b/src/lbm.cpp @@ -286,6 +286,8 @@ void LBM_Domain::enqueue_unvoxelize_mesh_on_device(const Mesh* mesh, const uchar }
string LBM_Domain::device_defines() const { return
"\n #pragma OPENCL FP_CONTRACT OFF" // prevents implicit FMA optimizations "\n #define fma(a, b, c) ((a) * (b) + (c))" // shadows OpenCL explicit function fma() "\n #define def_Nx "+to_string(Nx)+"u" "\n #define def_Ny "+to_string(Ny)+"u" "\n #define def_Nz "+to_string(Nz)+"u" OK,the moded OpenCL benchmark runs,and the 170hx fp32 flops increased to 6.285 Tflops,the original fp32 flops just only 0.395Tflops,6.285/0.395=16,so,i think the nvidia driver prevented gpu use full speed on FMA!