CNugteren / CLBlast

Tuned OpenCL BLAS
Apache License 2.0
1.06k stars 205 forks source link

Tests fail on Debian stretch with beignet #231

Closed vi closed 6 years ago

vi commented 6 years ago

With beignet 1.3.2-1 and CLBlast v1.2.0 it fails multiple tests:

Total Test time (real) = 397.13 sec

The following tests FAILED:
      5 - clblast_test_xdot (Failed)
      6 - clblast_test_xdotu (Failed)
      7 - clblast_test_xdotc (Failed)
      8 - clblast_test_xnrm2 (Failed)
      9 - clblast_test_xasum (Failed)
     12 - clblast_test_xgbmv (OTHER_FAULT)
     34 - clblast_test_xgemm (OTHER_FAULT)
     37 - clblast_test_xsyrk (Failed)
     38 - clblast_test_xherk (Failed)
     39 - clblast_test_xsyr2k (Failed)
     40 - clblast_test_xher2k (Failed)
     46 - clblast_test_xgemmbatched (OTHER_FAULT)

Additionally matmul build with NETLIB CLBlast fails multiplication if matrix is big enough:

$ ./matmul_cl -n 191 -a 6
...
Central cell: 42.8968
$ ./matmul_cl -n 192 -a 6
...
Central cell: 0

On master branch it also fails.

CNugteren commented 6 years ago

OK, that is good news, so Beignet 1.2.1 works quite good. One failing test it seems, shall we try and see if we can solve that? It is a bit of a special thing though, not really needed in all cases. But perhaps you can give me the output when running ./clblast_test_preprocessor? And perhaps gdb result as well?

vi commented 6 years ago
$ ./clblast_test_preprocessor

* Options given/available:
    -platform 0 [=default]
    -device 0 [=default]

* Testing simple OpenCL pre-processor for 'XaxpyFastest'
* Testing simple OpenCL pre-processor for 'Xger'
* Testing simple OpenCL pre-processor for 'XgemvFast'
* Testing simple OpenCL pre-processor for 'CopyMatrixFast'
* Testing simple OpenCL pre-processor for 'CopyPadMatrix'
* Testing simple OpenCL pre-processor for 'TransposeMatrixFast'
* Testing simple OpenCL pre-processor for 'TransposePadMatrix'
* Testing simple OpenCL pre-processor for 'Xgemm'
Warning unknown condition: 1
Warning unknown condition: (0
Warning unknown condition: 0)
Warning unknown condition: 2 != SUBGROUP_SIZE
Warning unknown condition: 8 < SUBGROUP_SIZE
* Testing simple OpenCL pre-processor for 'XgemmDirectTN'

    11 test(s) passed
    0 test(s) failed

* Testing simple OpenCL pre-processor for 'XaxpyFastest'
ASSERTION FAILED: 0
  at file /home/vi/src/git/beignet/backend/src/backend/gen_encoder.cpp, function virtual void gbe::GenEncoder::handleDouble(gbe::GenEncoder*, uint32_t, gbe::GenRegister, gbe::GenRegister, gbe::GenRegister), line 648
Trace/breakpoint trap
$ gdb -args ./clblast_test_preprocessor
...
Reading symbols from ./clblast_test_preprocessor...(no debugging symbols found)...done.
(gdb) r
Starting program: /mnt/src/git/CLBlast/build/clblast_test_preprocessor 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x3fff2513700 (LWP 21776)]
[New Thread 0x3ffefd12700 (LWP 21777)]
[New Thread 0x3ffed511700 (LWP 21778)]

* Options given/available:
    -platform 0 [=default]
    -device 0 [=default]

* Testing simple OpenCL pre-processor for 'XaxpyFastest'
* Testing simple OpenCL pre-processor for 'Xger'
* Testing simple OpenCL pre-processor for 'XgemvFast'
* Testing simple OpenCL pre-processor for 'CopyMatrixFast'
* Testing simple OpenCL pre-processor for 'CopyPadMatrix'
* Testing simple OpenCL pre-processor for 'TransposeMatrixFast'
* Testing simple OpenCL pre-processor for 'TransposePadMatrix'
* Testing simple OpenCL pre-processor for 'Xgemm'
Warning unknown condition: 1
Warning unknown condition: (0
Warning unknown condition: 0)
Warning unknown condition: 2 != SUBGROUP_SIZE
Warning unknown condition: 8 < SUBGROUP_SIZE
* Testing simple OpenCL pre-processor for 'XgemmDirectTN'

    11 test(s) passed
    0 test(s) failed

* Testing simple OpenCL pre-processor for 'XaxpyFastest'
ASSERTION FAILED: 0
  at file /home/vi/src/git/beignet/backend/src/backend/gen_encoder.cpp, function virtual void gbe::GenEncoder::handleDouble(gbe::GenEncoder*, uint32_t, gbe::GenRegister, gbe::GenRegister, gbe::GenRegister), line 648

Thread 1 "clblast_test_pr" received signal SIGTRAP, Trace/breakpoint trap.
gbe::onFailedAssertion (msg=<optimized out>, file=<optimized out>, fn=<optimized out>, line=<optimized out>)
    at /home/vi/src/git/beignet/backend/src/sys/assert.cpp:76
76      _exit(-1);
(gdb) bt
#0  gbe::onFailedAssertion (msg=<optimized out>, file=<optimized out>, fn=<optimized out>, line=<optimized out>)
    at /home/vi/src/git/beignet/backend/src/sys/assert.cpp:76
#1  0x000003ffe57d738e in gbe::GenEncoder::MUL (this=<optimized out>, dest=..., src0=..., src1=...)
    at /home/vi/src/git/beignet/backend/src/backend/gen_encoder.cpp:860
#2  0x000003ffe5790f18 in gbe::GenContext::emitBinaryInstruction (this=0x2aaab401780, insn=...)
    at /home/vi/src/git/beignet/backend/src/backend/gen_context.cpp:767
#3  0x000003ffe57b9a97 in gbe::GenContext::emitInstructionStream (this=this@entry=0x2aaab401780)
    at /home/vi/src/git/beignet/backend/src/./backend/gen_insn_selection.hxx:36
#4  0x000003ffe57b9f5a in gbe::GenContext::emitCode (this=0x2aaab401780) at /home/vi/src/git/beignet/backend/src/backend/gen_context.cpp:3858
#5  0x000003ffe568cd22 in gbe::Context::compileKernel (this=this@entry=0x2aaab401780) at /home/vi/src/git/beignet/backend/src/backend/context.cpp:389
#6  0x000003ffe57cc2d5 in gbe::GenProgram::compileKernel (this=<optimized out>, unit=..., name="Xaxpy", relaxMath=<optimized out>, 
    profiling=<optimized out>) at /home/vi/src/git/beignet/backend/src/backend/gen_program.cpp:212
#7  0x000003ffe56902b6 in gbe::Program::buildFromUnit (this=this@entry=0x2aaaaf00990, unit=..., error="")
    at /home/vi/src/git/beignet/backend/src/backend/program.cpp:188
#8  0x000003ffe5690930 in gbe::Program::buildFromLLVMFile (this=this@entry=0x2aaaaf00990, fileName=fileName@entry=0x0, 
    module=module@entry=0x2aaaaf17bb0, error="", optLevel=optLevel@entry=1) at /home/vi/src/git/beignet/backend/src/backend/program.cpp:163
#9  0x000003ffe57cc985 in gbe::genProgramNewFromLLVM (deviceID=358, fileName=0x0, module=0x2aaaaf17bb0, llvm_ctx=0x2aaab3f8130, 
    asm_file_name=<optimized out>, stringSize=1048576, err=0x2aaaaf20950 "", errSize=0x2aaaaeb8800, optLevel=1, 
    options=0x3ffffffd9b0 " -cl-std=CL1.1") at /home/vi/src/git/beignet/backend/src/backend/gen_program.cpp:456
#10 0x000003ffe569f863 in gbe::programNewFromSource (deviceID=358, source=<optimized out>, stringSize=1048576, 
    options=0x3ffffffd9b0 " -cl-std=CL1.1", err=0x2aaaaf20950 "", errSize=0x2aaaaeb8800)
    at /home/vi/src/git/beignet/backend/src/backend/program.cpp:1027
#11 0x000003ffeaab4f48 in cl_program_build (p=p@entry=0x2aaaaeb8770, options=0x3ffffffd9b0 " -cl-std=CL1.1")
    at /home/vi/src/git/beignet/src/cl_program.c:589
#12 0x000003ffeaaac426 in clBuildProgram (program=0x2aaaaeb8770, num_devices=<optimized out>, device_list=<optimized out>, options=<optimized out>, 
    pfn_notify=0x0, user_data=0x0) at /home/vi/src/git/beignet/src/cl_api.c:957
#13 0x000003fff78befee in clblast::CompileFromSource(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, clblast::Precision, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, clblast::Device const&, clblast::Context const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, unsigned long, bool) () from /mnt/src/git/CLBlast/build/libclblast.so.1
#14 0x000002aaaab23d96 in clblast::TestKernel(clblast::Device const&, clblast::Context const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, clblast::Precision) ()
#15 0x000002aaaab24538 in clblast::RunPreprocessor(int, char**, bool, clblast::Precision) ()
#16 0x000002aaaaacbaac in main ()

(gdb) bt full
#0  gbe::onFailedAssertion (msg=<optimized out>, file=<optimized out>, fn=<optimized out>, line=<optimized out>)
    at /home/vi/src/git/beignet/backend/src/sys/assert.cpp:76
        __PRETTY_FUNCTION__ = "void gbe::onFailedAssertion(const char*, const char*, const char*, int32_t)"
#1  0x000003ffe57d738e in gbe::GenEncoder::MUL (this=<optimized out>, dest=..., src0=..., src1=...)
    at /home/vi/src/git/beignet/backend/src/backend/gen_encoder.cpp:860
        __PRETTY_FUNCTION__ = "void gbe::GenEncoder::MUL(gbe::GenRegister, gbe::GenRegister, gbe::GenRegister)"
#2  0x000003ffe5790f18 in gbe::GenContext::emitBinaryInstruction (this=0x2aaab401780, insn=...)
    at /home/vi/src/git/beignet/backend/src/backend/gen_context.cpp:767
        dst = <optimized out>
        __PRETTY_FUNCTION__ = "virtual void gbe::GenContext::emitBinaryInstruction(const gbe::SelectionInstruction&)"
#3  0x000003ffe57b9a97 in gbe::GenContext::emitInstructionStream (this=this@entry=0x2aaab401780)
    at /home/vi/src/git/beignet/backend/src/./backend/gen_insn_selection.hxx:36
        opcode = <optimized out>
        insn = @0x2aaab7fa550: {<NonCopyable> = {<No data fields>}, <gbe::intrusive_list_node> = {next = 0x2aaab7fa710, prev = 0x2aaab7fa780}, 
          parent = 0x2aaab7e23a0, state = {physicalFlag = 1, flag = 0, subFlag = 0, grfFlag = 1, externFlag = 0, modFlag = 0, flagGen = 0, 
            execWidth = 16, quarterControl = 0, nibControl = 0, accWrEnable = 0, noMask = 0, predicate = 0, inversePredicate = 0, saturate = 0, 
            flagIndex = 0}, extra = {{function = 0, elem = 0}, {width = 0, vstride = 0, hstride = 0, offset = 0}, {scratchOffset = 0, 
              scratchMsgHeader = 0}, {bti = 0, msglen = 0, is3DWrite = 0}, {rdbti = 0, sampler = 0, rdmsglen = 0, isLD = false, isUniform = false}, {
              vme_bti = 0, msg_type = 0, vme_search_path_lut = 0, lut_sub = 0}, barrierType = 0, waitType = 0, longjmp = false, indirect_offset = 0, 
            {pointNum = 0, timestampType = 0}, {profilingType = 0, profilingBTI = 0}, {printfNum = 0, printfBTI = 0, continueFlag = 0, 
              printfSize = 0}, workgroupOp = 0}, opcode = 35 '#', dstNum = 1 '\001', srcNum = 2 '\002', index = 0, index1 = 0, ID = 44, DBGInfo = {
            line = 2877236880, col = 682}, regs = 0x2aaab7fa590}
        __for_range = @0x2aaab7e23b0: {<gbe::intrusive_list_base> = {m_root = {next = 0x2aaab7df7b0, prev = 0x2aaab687ca0}}, <No data fields>}
        block = @0x2aaab7e23a0: {<NonCopyable> = {<No data fields>}, <gbe::intrusive_list_node> = {next = 0x2aaab696800, prev = 0x2aaab7e2340}, 
          insnList = {<gbe::intrusive_list_base> = {m_root = {next = 0x2aaab7df7b0, prev = 0x2aaab687ca0}}, <No data fields>}, 
          vectorList = {<gbe::intrusive_list_base> = {m_root = {next = 0x2aaab299b90, prev = 0x2aaab7f25b0}}, <No data fields>}, 
          tmp = {<std::vector<gbe::ir::Register, gbe::Allocator<gbe::ir::Register> >> = std::vector of length 3, capacity 4 = {{unsafe = 83}, {
                unsafe = 84}, {unsafe = 86}}, <No data fields>}, bb = 0x2aaab802620, isLargeBlock = false, endifLabel = {unsafe = 6}, 
          endifOffset = -1, hasBarrier = false, hasBranch = false, removeSimpleIfEndif = false}
        __for_range = @0x2aaab74d480: {<gbe::intrusive_list_base> = {m_root = {next = 0x2aaab74d600, prev = 0x2aaab696860}}, <No data fields>}
        __PRETTY_FUNCTION__ = "void gbe::GenContext::emitInstructionStream()"
#4  0x000003ffe57b9f5a in gbe::GenContext::emitCode (this=0x2aaab401780) at /home/vi/src/git/beignet/backend/src/backend/gen_context.cpp:3858
        genKernel = 0x2aaab7e2500
#5  0x000003ffe568cd22 in gbe::Context::compileKernel (this=this@entry=0x2aaab401780) at /home/vi/src/git/beignet/backend/src/backend/context.cpp:389
No locals.
#6  0x000003ffe57cc2d5 in gbe::GenProgram::compileKernel (this=<optimized out>, unit=..., name="Xaxpy", relaxMath=<optimized out>, 
    profiling=<optimized out>) at /home/vi/src/git/beignet/backend/src/backend/gen_program.cpp:212
---Type <return> to continue, or q <return> to quit---
        simdWidth = 16
        limitRegisterPressure = false
        reservedSpillRegs = 0
        simdFn = 0x2aaab7f73d0
        fn = <optimized out>
        __PRETTY_FUNCTION__ = "virtual gbe::Kernel* gbe::GenProgram::compileKernel(const gbe::ir::Unit&, const string&, bool, int)"
        codeGenNum = 4
        codeGen = 0
        ctx = <optimized out>
        kernel = 0x0
#7  0x000003ffe56902b6 in gbe::Program::buildFromUnit (this=this@entry=0x2aaaaf00990, unit=..., error="")
    at /home/vi/src/git/beignet/backend/src/backend/program.cpp:188
        name = "Xaxpy"
        kernel = <optimized out>
        __for_range = @0x2aaab40ba38: {<std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, gbe::ir::Function*, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, gbe::Allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, gbe::ir::Function*> > >> = std::map with 4 elements = {["Xaxpy"] = 0x2aaab7f73d0, 
            ["XaxpyBatched"] = 0x2aaab2cbe90, ["XaxpyFaster"] = 0x2aaaaee9db0, 
            ["XaxpyFastest"] = 0x2aaab5d3830}, <NonCopyable> = {<No data fields>}, <No data fields>}
        kernelNum = <optimized out>
        strictMath = <optimized out>
#8  0x000003ffe5690930 in gbe::Program::buildFromLLVMFile (this=this@entry=0x2aaaaf00990, fileName=fileName@entry=0x0, 
    module=module@entry=0x2aaaaf17bb0, error="", optLevel=optLevel@entry=1) at /home/vi/src/git/beignet/backend/src/backend/program.cpp:163
        error2 = ""
        unit = 0x2aaab40ba00
        cloned_module = 0x2aaab0385b0
        ret = false
        strictMath = <optimized out>
#9  0x000003ffe57cc985 in gbe::genProgramNewFromLLVM (deviceID=358, fileName=0x0, module=0x2aaaaf17bb0, llvm_ctx=0x2aaab3f8130, 
    asm_file_name=<optimized out>, stringSize=1048576, err=0x2aaaaf20950 "", errSize=0x2aaaaeb8800, optLevel=1, 
    options=0x3ffffffd9b0 " -cl-std=CL1.1") at /home/vi/src/git/beignet/backend/src/backend/gen_program.cpp:456
        fast_relaxed_math = <optimized out>
        error = ""
#10 0x000003ffe569f863 in gbe::programNewFromSource (deviceID=358, source=<optimized out>, stringSize=1048576, 
    options=0x3ffffffd9b0 " -cl-std=CL1.1", err=0x2aaaaf20950 "", errSize=0x2aaaaeb8800)
    at /home/vi/src/git/beignet/backend/src/backend/program.cpp:1027
        clangErrSize = 0
---Type <return> to continue, or q <return> to quit---
        optLevel = 1
        clOpt = std::vector of length 5, capacity 8 = {"-I/mnt/src/git/beignet/build/backend/src/libocl//usr/local/lib/beignet//include/", 
          "-D__OPENCL_C_VERSION__=110", "-cl-std=CL1.1", "-include", "ocl.h"}
        dumpLLVMFileName = ""
        dumpASMFileName = ""
        dumpSPIRBinaryName = ""
        p = <optimized out>
        out_module = 0x2aaaaf17bb0
        llvm_ctx = 0x2aaab3f8130
        llvm_mutex = {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, 
                __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, <No data fields>}
#11 0x000003ffeaab4f48 in cl_program_build (p=p@entry=0x2aaaaeb8770, options=0x3ffffffd9b0 " -cl-std=CL1.1")
    at /home/vi/src/git/beignet/src/cl_program.c:589
        err = 0
        i = 0
        copyed = 0
#12 0x000003ffeaaac426 in clBuildProgram (program=0x2aaaaeb8770, num_devices=<optimized out>, device_list=<optimized out>, options=<optimized out>, 
    pfn_notify=0x0, user_data=0x0) at /home/vi/src/git/beignet/src/cl_api.c:957
        err = 0
        __PRETTY_FUNCTION__ = "clBuildProgram"
#13 0x000003fff78befee in clblast::CompileFromSource(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, clblast::Precision, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, clblast::Device const&, clblast::Context const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, unsigned long, bool) () from /mnt/src/git/CLBlast/build/libclblast.so.1
No symbol table info available.
#14 0x000002aaaab23d96 in clblast::TestKernel(clblast::Device const&, clblast::Context const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, clblast::Precision) ()
No symbol table info available.
#15 0x000002aaaab24538 in clblast::RunPreprocessor(int, char**, bool, clblast::Precision) ()
No symbol table info available.
#16 0x000002aaaaacbaac in main ()
No symbol table info available.
(gdb) quit
A debugging session is active.

    Inferior 1 [process 21753] will be killed.

Quit anyway? (y or n) y
vi commented 6 years ago

First phase of tuning succeed (there are 40 JSONs), but database.py failed:

$ python ../scripts/database/database.py . ..
[database] Loading database from '../scripts/database/database.json'
[database] Processing './clblast_transpose_3232.json' with 44 new items
[database] Processing './clblast_xgemm_12_3232.json' with 97 new items
[database] Processing './clblast_xgemv_fast_rot_3232.json' with 68 new items
[database] Processing './clblast_copy_32.json' with 128 new items
[database] Processing './clblast_xgemv_fast_3232.json' with 30 new items
[database] Processing './clblast_xdot_2_3232.json' with 5 new items
[database] Processing './clblast_xgemm_direct_1_3232.json' with 45 new items
[database] Processing './clblast_xger_32.json' with 108 new items
[database] Processing './clblast_padtranspose_3232.json' with 14 new items
[database] Processing './clblast_xdot_2_32.json' with 5 new items
[database] Processing './clblast_gemm_routine_32.json' with 31 new items
[database] Processing './clblast_padtranspose_32.json' with 16 new items
[database] Processing './clblast_pad_3232.json' with 72 new items
[database] Processing './clblast_xgemm_11_3232.json' with 374 new items
[database] Processing './clblast_xgemm_direct_2_32.json' with 125 new items
[database] Processing './clblast_copy_3232.json' with 128 new items
[database] Processing './clblast_xdot_1_3232.json' with 5 new items
[database] Processing './clblast_xgemm_12_32.json' with 93 new items
[database] Processing './clblast_xgemv_3232.json' with 12 new items
[database] Processing './clblast_xaxpy_32.json' with 64 new items
[database] Processing './clblast_xgemm_2_32.json' with 229 new items
[database] Processing './clblast_invert_3232.json' with 2 new items
[database] Processing './clblast_xgemv_fast_32.json' with 30 new items
[database] Processing './clblast_xdot_1_32.json' with 5 new items
[database] Processing './clblast_xgemv_fast_rot_32.json' with 68 new items
[database] Processing './clblast_xaxpy_3232.json' with 64 new items
[database] Processing './clblast_routine_xtrsv_32.json' with 4 new items
[database] Processing './clblast_xgemm_2_3232.json' with 155 new items
[database] Processing './clblast_routine_xtrsv_3232.json' with 4 new items
[database] Processing './clblast_xgemm_direct_1_32.json' with 45 new items
[database] Processing './clblast_xgemv_32.json' with 12 new items
[database] Processing './clblast_pad_32.json' with 72 new items
[database] Processing './clblast_xgemm_11_32.json' with 386 new items
[database] Processing './clblast_xger_3232.json' with 108 new items
[database] Processing './clblast_invert_32.json' with 2 new items
[database] Processing './clblast_xgemm_direct_2_3232.json' with 59 new items
[database] Processing './clblast_gemm_routine_3232.json' with 31 new items
[database] Processing './clblast_transpose_32.json' with 52 new items
[database] Processing './clblast_xgemm_1_32.json' with 558 new items
[database] Processing './clblast_xgemm_1_3232.json' with 490 new items
[database] Saving database to '../scripts/database/database.json'
[database] Calculating the best results per device/kernel...
[database] Calculating the default values...
[database] Producing a C++ database in '../src/database/kernels'...
[database] No results found for invert:16, retrieving defaults from invert:32
[database] No results found for invert:64, retrieving defaults from invert:32
[database] No results found for invert:6464, retrieving defaults from invert:32
[database] No results found for trsv_routine:16, retrieving defaults from trsv_routine:32
[database] No results found for trsv_routine:64, retrieving defaults from trsv_routine:32
[database] No results found for trsv_routine:6464, retrieving defaults from trsv_routine:32
Traceback (most recent call last):
  File "../scripts/database/database.py", line 185, in <module>
    main(sys.argv[1:])
  File "../scripts/database/database.py", line 179, in main
    clblast.print_cpp_database(database_best_results, cpp_database_path)
  File "/mnt/src/git/CLBlast/scripts/database/database/clblast.py", line 231, in print_cpp_database
    assert parameter_name == parameter_names[parameter_index]
AssertionError
vi commented 6 years ago

Results and output of the first phase of tuning: https://vi-server.org/pub/clblast_beignet_gen3_tuning.7z

CNugteren commented 6 years ago

OK, thanks for the feedback.

ASSERTION FAILED: 0 at file /home/vi/src/git/beignet/backend/src/backend/gen_encoder.cpp, function virtual void gbe::GenEncoder::handleDouble(gbe::GenEncoder*, uint32_t, gbe::GenRegister, gbe::GenRegister, gbe::GenRegister), line 648 Trace/breakpoint trap

So that's definitely a Beignet bug, so let's forget about that. This 'preprocessor' is not enabled anyway for your GPU, so a failed test won't harm you.

Good to see that the tuning also works! About the Python script, I tried to reproduce with your results but didn't get your issue. Perhaps you have an old database on disk? You could try to remove scripts/database/database.json and then re-try (it will download the latest version).

vi commented 6 years ago

After rm ../scripts/database/database.json it worked.

If database format changes without changing the download URL, does it mean that old CLBlast versions are untunable anymore? Maybe it should download not from master, but from current commit?

CNugteren commented 6 years ago

Yes, you are right. It should ideally be a git submodule or something. But not a super urgent thing I guess, because it is mostly power users that do this and the use-case of tuning first and then a few months later again is not so common.

So, what I'll do now is add your results new to the latest master and also make a note that Beignet 1.2.1 is the one to go for with your device. And then we can close this issue, am I right?

vi commented 6 years ago

After the tuning tests seem to rung longer:

Total Test time (real) = 709.90 sec
Total Test time (real) = 671.09 sec

What about the connections leak? It seems like CLBlast (or Beignet, or at least the tests and tuners) opens something and not closes it properly.

vi commented 6 years ago

So, what I'll do now is add your results new to the latest master and also make a note that Beignet 1.2.1 is the one to go for with your device. And then we can close this issue, am I right?

Seems OK. This issue is a already bit long and takes some browser resources to load and render. New issues would be opened about other problems like connections leak.

CNugteren commented 6 years ago

After the tuning tests seem to rung longer:

Could very well be, the tests typically test corner cases and very small matrices, so time is actually mostly taken by CPU reference code, CPU-GPU copy, and a bit by (perhaps slower) kernels.

Since the main issue is solved, I'll close this indeed.