Closed akshayc11 closed 8 years ago
@akshayc11 Can you check if this happens in the develop branch ?
This does not happen with the develop branch.
Thanks
Akshay
There is an additional issue when I move to develop. Some of the gemm calls return matrixes with nans, which do not appear if I revert back to the commit version that I had used before:
commit 9731ea2a270509211a47bf6cf9df4de2069ccc52 Merge: a6b3f9d 3f032e7 Author: David Tanner david.tanner@amd.com Date: Wed Jul 1 15:00:31 2015 -0500
@akshayc11 Is this for complex gemm ? If so check out my PR. It has a fix for that.
This is the PR I am referring to: https://github.com/clMathLibraries/clBLAS/pull/202
Its for gemm with real numbers only.. I am pretty sure the code I am running never deals with complex numbers.
@akshayc11 Any chance you can link a stand alone snippet that reproduces the problem ? I am investigating other issues with clBLAS that we face when building with our library. I'll look into fixing this alongside the other issues.
@pavanky Sorry for the delayed response. Unfortunately, I do not have a stand-alone snippet at this point. The code-base where I use this is quite convoluted and has multiple nested function calls before reaching the gemm call. For now, I have reverted back to a version of master that did work before.
The following is with the clBLAS library using the develop branch
When I try to run the sample code, I get the following:
$ gcc -I/usr/local/cuda/include -I/data-local/akshayc/Workspace/Software/asr-lge-embedded/tools/clBLAS-dynamic/build-linux-dynamic/package/include example_sgemm.c -o gemm -L/data-local/akshayc/Workspace/Software/asr-lge-embedded/tools/clBLAS-dynamic/build-linux-dynamic/package/lib64 -lclBLAS -L/usr/local/cuda/lib64 -lOpenCL
$ export LD_LIBRARY_PATH=/usr/local/cuda/lib64:`pwd`/package/lib64:$LD_LIBRARY_PATH
$ ./gemm
Segmentation fault (core dumped)
On running the command with valgrind, I get:
$ valgrind ./gemm
==13673== Memcheck, a memory error detector
==13673== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==13673== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==13673== Command: ./gemm
==13673==
==13673== Invalid read of size 4
==13673== at 0x67B79A9: ??? (in /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0)
==13673== by 0x67B7F28: ??? (in /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0)
==13673== by 0x4010139: call_init.part.0 (dl-init.c:78)
==13673== by 0x4010222: call_init (dl-init.c:36)
==13673== by 0x4010222: _dl_init (dl-init.c:126)
==13673== by 0x4001309: ??? (in /lib/x86_64-linux-gnu/ld-2.19.so)
==13673== Address 0x77ae404 is 20 bytes inside a block of size 23 alloc'd
==13673== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13673== by 0x67B796A: ??? (in /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0)
==13673== by 0x67B7F28: ??? (in /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0)
==13673== by 0x4010139: call_init.part.0 (dl-init.c:78)
==13673== by 0x4010222: call_init (dl-init.c:36)
==13673== by 0x4010222: _dl_init (dl-init.c:126)
==13673== by 0x4001309: ??? (in /lib/x86_64-linux-gnu/ld-2.19.so)
==13673==
==13673== Warning: noted but unhandled ioctl 0x30000001 with no size/direction hints.
==13673== This could cause spurious value errors to appear.
==13673== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==13673== Warning: set address range perms: large range [0x200000000, 0x700000000) (noaccess)
==13673== Warning: set address range perms: large range [0x900000000, 0xc00000000) (noaccess)
==13673== Warning: set address range perms: large range [0xc00000000, 0xf00000000) (noaccess)
==13673== Use of uninitialised value of size 8
==13673== at 0x400FDF2: _dl_signal_error (dl-error.c:94)
==13673== by 0x400FF7D: _dl_signal_cerror (dl-error.c:155)
==13673== by 0x400B267: _dl_lookup_symbol_x (dl-lookup.c:779)
==13673== by 0x400F556: _dl_fixup (dl-runtime.c:111)
==13673== by 0x4016514: _dl_runtime_resolve (dl-trampoline.S:45)
==13673== by 0x50D2B82: rwlockInit (rwlock.c:110)
==13673== by 0x5104D65: clblasFunctorCache<clblasSscalFunctorGeneric, _clblasXscalFunctorGenericData, std::less<_clblasXscalFunctorGenericData> >::clblasFunctorCache() (functor.h:280)
==13673== by 0x5104B2B: __static_initialization_and_destruction_0(int, int) (functor_xscal_generic.cc:194)
==13673== by 0x5104C4B: _GLOBAL__sub_I_functor_xscal_generic.cc (functor_xscal_generic.cc:439)
==13673== by 0x4010139: call_init.part.0 (dl-init.c:78)
==13673== by 0x4010222: call_init (dl-init.c:36)
==13673== by 0x4010222: _dl_init (dl-init.c:126)
==13673== by 0x4001309: ??? (in /lib/x86_64-linux-gnu/ld-2.19.so)
==13673==
==13673==
==13673== Process terminating with default action of signal 11 (SIGSEGV)
==13673== Access not within mapped region at address 0xB
==13673== at 0x400FDF2: _dl_signal_error (dl-error.c:94)
==13673== by 0x400FF7D: _dl_signal_cerror (dl-error.c:155)
==13673== by 0x400B267: _dl_lookup_symbol_x (dl-lookup.c:779)
==13673== by 0x400F556: _dl_fixup (dl-runtime.c:111)
==13673== by 0x4016514: _dl_runtime_resolve (dl-trampoline.S:45)
==13673== by 0x50D2B82: rwlockInit (rwlock.c:110)
==13673== by 0x5104D65: clblasFunctorCache<clblasSscalFunctorGeneric, _clblasXscalFunctorGenericData, std::less<_clblasXscalFunctorGenericData> >::clblasFunctorCache() (functor.h:280)
==13673== by 0x5104B2B: __static_initialization_and_destruction_0(int, int) (functor_xscal_generic.cc:194)
==13673== by 0x5104C4B: _GLOBAL__sub_I_functor_xscal_generic.cc (functor_xscal_generic.cc:439)
==13673== by 0x4010139: call_init.part.0 (dl-init.c:78)
==13673== by 0x4010222: call_init (dl-init.c:36)
==13673== by 0x4010222: _dl_init (dl-init.c:126)
==13673== by 0x4001309: ??? (in /lib/x86_64-linux-gnu/ld-2.19.so)
==13673== If you believe this happened as a result of a stack
==13673== overflow in your program's main thread (unlikely but
==13673== possible), you can try to increase the size of the
==13673== main thread stack using the --main-stacksize= flag.
==13673== The main thread stack size used in this run was 8388608.
==13673==
==13673== HEAP SUMMARY:
==13673== in use at exit: 209,565 bytes in 128 blocks
==13673== total heap usage: 234 allocs, 106 frees, 273,577 bytes allocated
==13673==
==13673== LEAK SUMMARY:
==13673== definitely lost: 32,816 bytes in 1 blocks
==13673== indirectly lost: 0 bytes in 0 blocks
==13673== possibly lost: 2,312 bytes in 17 blocks
==13673== still reachable: 174,437 bytes in 110 blocks
==13673== suppressed: 0 bytes in 0 blocks
==13673== Rerun with --leak-check=full to see details of leaked memory
==13673==
==13673== For counts of detected and suppressed errors, rerun with: -v
==13673== Use --track-origins=yes to see where uninitialised values come from
==13673== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 1 from 1)
Similar to https://github.com/clMathLibraries/clBLAS/issues/172
I have been facing this issue on an Ubuntu 14.04 machine with NVIDIA Geforce Titan with cuda-7.5
This code used to work for me before the AutoGemm overhaul, but the latest iteration does not. It happens during the first call to the function itself.
Please let me know if you need any additional information.
I have been unsuccessful in compiling the test cases, so I cannot verify that the error is reproducible.
Thanks
Akshay