OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.43k stars 1.51k forks source link

LAPACKE_sgesdd_work on armv7 android float-abi=hard never returns for some cases #894

Open Mikaso opened 8 years ago

Mikaso commented 8 years ago

Hi!

I'm using openblas on an android device. So far everything works fine, but now I'm facing a strange error. In my function, I'm calling LAPACKE_sgesdd_work directly using preallocated memory.

For special cases: It never returns.

I compiled LAPACKE_sgesddwork with debug flags and jumped into the disassembly: It runs until it gets into the fortran call sgesdd. It never returns from there.

This code should give you the same behavior on an armv7 device:

float A[ 9 ] = {  1.f,  0.f, -1.f,
                 -2.f,  1.f,  4.f,
                 -3.f,  4.f,  5.f };

float U[ 9 ]; float S[ 3 ]; float V_t[ 9 ];

/* number of rows of A */
const int m = 3;
/* number of cols of A */
const int n = 3;
/* leading dimension of A */
const int ld_a = n;
/* leading dimension of U */
const int ld_u = m;
/* leading dimension of V_t */
const int ld_v_t = n;
/* SVD workspace array */
int iwork[ 8 * 4 ];
/* SVD workspace array length */
int lwork = 180;
/* SVD workspace array */
float work[ 180 * 2 ];

int result = 1;

result = LAPACKE_sgesdd_work(LAPACK_ROW_MAJOR, 'A', m, n, A, ld_a, S, U, ld_u,
                                                     V_t, ld_v_t, work, lwork, iwork);

Crucial: If I compile exactly the same code for x86, it works.

I do allocate more memory than needed just to be sure that there is no access vialoation inside.

On the other hand the following does work:

float A[ 4 ] = {  1.f,  0.f,
                 -2.f,  1.f };

float U[ 4 ]; float S[ 2 ]; float V_t[ 4 ];

/* number of rows of A */
const int m = 2;
/* number of cols of A */
const int n = 2;
/* leading dimension of A */
const int ld_a = n;
/* leading dimension of U */
const int ld_u = m;
/* leading dimension of V_t */
const int ld_v_t = n;
/* SVD workspace array */
int iwork[ 8 * 3 ];
/* SVD workspace array length */
int lwork = LAPACK_GESDD_LWORK( 3, 3 );
/* SVD workspace array */
float work[ LAPACK_GESDD_LWORK( 3, 3 ) ];

int result = 1;

result = LAPACKE_sgesdd_work(LAPACK_ROW_MAJOR, 'A', m, n, A, ld_a, S, U, ld_u,
                                                     V_t, ld_v_t, work, lwork, iwork);

I can now even confirm, it is the same behavior when calling sgesdd_ directly:

 float A[ 9 ] = {  1.f,  0.f, -1.f,
                  -2.f,  1.f,  4.f,
                  -3.f,  4.f,  5.f };

float U[ 9 ]; float S[ 3 ]; float V_t[ 9 ];

/* number of rows of A */
int m = 3;
/* number of cols of A */
int n = 3;
/* leading dimension of A */
int ld_a = n;
/* leading dimension of U */
int ld_u = m;
/* leading dimension of V_t */
int ld_v_t = n;
/* SVD workspace array */
int iwork[ 8 * 4 ];
/* SVD workspace array length */
int lwork = 180;
/* SVD workspace array */
float work[ 180 * 2 ];

char jobz = 'A';
int info = 1;

sgesdd_(&jobz, &m, &n, A, &ld_a, S, U, &ld_u, V_t,
        &ld_v_t, work, &lwork, iwork, &info );

I also tested the latter example with lapack-3.6.0 using reference BLAS from netlib and there it works.

brada4 commented 8 years ago

Are you sure workspace estimation (*work=-1) returns same workspace size on x86 and arm?

Please dump threads and share the log from frozen process: gdb 2>&1 | tee gdb.log gdb> atta gdb> t a a bt gdb> deta gdb> q

Mikaso commented 8 years ago

No I'm not sure, whether the workspace estimation returns the same size on arm or x86, but neither am I running any test on x86 nor am I using the workspace estimation. I read the documentation and preallocated the workspace bigger than necessary.

I'm not sure how to run these commands on my phone using a remote debugger. I'm working with QtCreator to compile and install a test app with this code. If i try to pause the application with the QtCreator, after a while, it tells me whether I want to to stop gdb since it does not respond or give it more time.

wernsaar commented 8 years ago

Hi,

every lapack function, that needs a local workspace, calls ilaenv to get a block size. If you need another local work size, you have to modify ilaenv.f in lapack-netlib/SRC. Look at source of sgesdd.f to find the calls to ilaenv

Best regards Werner

On 05/30/2016 11:05 AM, Mikaso wrote:

No I'm not sure, whether the workspace estimation returns the same size on arm or x86, but neither am I running any test on x86 nor am I using the workspace estimation. I read the documentation and preallocated the workspace bigger than necessary.

I'm not sure how to run these commands on my phone using a remote debugger. I'm working with QtCreator to compile and install a test app with this code. If i try to pause the application with the QtCreator, after a while, it tells me whether I want to to stop gdb since it does not respond or give it more time.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/xianyi/OpenBLAS/issues/894#issuecomment-222447676, or mute the thread https://github.com/notifications/unsubscribe/AC1e6Gtf_sMU086lPBeIHezNpJPIhFEmks5qGqhVgaJpZM4ImdUT.

Mikaso commented 8 years ago

Ok, thanks for the information, but what am I supposed to do about it and how should I know whether it's right? Shouldn't both calls of sgesdd use the same amount of memory?

brada4 commented 8 years ago

If you had machine with dozen of terabytes of RAM you would rewrite ilaenv. Can you dump the threads?

Mikaso commented 8 years ago

I found how to run the gdb commands in QtCreator. I cannot run it the moment it freezes, but here is the dump right after the call of sgesdd_:

>&"t a a bt\n"
>~"\nThread 22 (Thread 30114.30179):\n"
>~"#0  0xb6d3440c in __ioctl () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb6d3c4d4 in ioctl () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#2  0xb6f46d88 in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>~"\nThread 21 (Thread 30114.30154):\n"
>~"#0  0xb6d02610 in syscall () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb6d32b24 in __pthread_cond_timedwait_relative(pthread_cond_internal_t*, pthread_mutex_t*, timespec const*) () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#2  0xa87fb00e in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>~"\nThread 20 (Thread 30114.30153):\n"
>~"#0  0xb6d02610 in syscall () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb6d32b24 in __pthread_cond_timedwait_relative(pthread_cond_internal_t*, pthread_mutex_t*, timespec const*) () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#2  0x9f0ad252 in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>~"\nThread 19 (Thread 30114.30152):\n"
>~"#0  0xb6d02610 in syscall () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb6d32b24 in __pthread_cond_timedwait_relative(pthread_cond_internal_t*, pthread_mutex_t*, timespec const*) () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#2  0xa099660e in QWaitConditionPrivate::wait(unsigned long) () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Core.so\n"
>~"#3  0xa0996be2 in QWaitCondition::wait(QMutex*, unsigned long) () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Core.so\n"
>~"#4  0xaeca7378 in QSGRenderThreadEventQueue::takeEvent(bool) () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Quick.so\n"
>~"#5  0xaeca74d2 in QSGRenderThread::processEventsAndWaitForMore() () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Quick.so\n"
>~"#6  0xaeca76d0 in QSGRenderThread::run() () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Quick.so\n"
>~"#7  0xa0995d16 in QThreadPrivate::start(void*) () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Core.so\n"
>~"#8  0xb6d32ea6 in __pthread_start(void*) () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#9  0xb6d04c36 in __start_thread () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#10 0x00000000 in ?? ()\n"
>~"\nThread 18 (Thread 30114.30150):\n"
>~"#0  0xb6d344e0 in __pselect6 () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb6d07398 in select () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#2  0xa0adc388 in qt_safe_select(int, __kernel_fd_set*, __kernel_fd_set*, __kernel_fd_set*, timespec const*) () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Core.so\n"
>~"#3  0xa0add230 in QEventDispatcherUNIXPrivate::doSelect(QFlags, timespec*) () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Core.so\n"
>~"#4  0xa0add4f8 in QEventDispatcherUNIX::processEvents(QFlags) () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Core.so\n"
>~"#5  0xa0aa6dc4 in QEventLoop::processEvents(QFlags) () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Core.so\n"
>~"#6  0xa0aa73de in QEventLoop::exec(QFlags) () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Core.so\n"
>~"#7  0xa0993758 in QThread::exec() () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Core.so\n"
>~"#8  0x9d43b88a in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>~"\nThread 17 (Thread 30114.30149):\n"
>~"#0  0xb6d344e0 in __pselect6 () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb6d07398 in select () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#2  0xa0adc388 in qt_safe_select(int, __kernel_fd_set*, __kernel_fd_set*, __kernel_fd_set*, timespec const*) () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Core.so\n"
>~"#3  0xa0add230 in QEventDispatcherUNIXPrivate::doSelect(QFlags, timespec*) () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Core.so\n"
>~"#4  0xa0add4f8 in QEventDispatcherUNIX::processEvents(QFlags) () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Core.so\n"
>~"#5  0xa0aa6dc4 in QEventLoop::processEvents(QFlags) () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Core.so\n"
>~"#6  0xa0aa73de in QEventLoop::exec(QFlags) () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Core.so\n"
>~"#7  0xa0993758 in QThread::exec() () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Core.so\n"
>~"#8  0xaefa18e0 in QQmlThreadPrivate::run() () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Qml.so\n"
>~"#9  0xa0995d16 in QThreadPrivate::start(void*) () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\lib\\libQt5Core.so\n"
>~"#10 0xb6d32ea6 in __pthread_start(void*) () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#11 0xb6d04c36 in __start_thread () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#12 0x00000000 in ?? ()\n"
>~"\nThread 15 (Thread 30114.30144):\n"
>~"#0  0xb6d02610 in syscall () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb6d32b24 in __pthread_cond_timedwait_relative(pthread_cond_internal_t*, pthread_mutex_t*, timespec const*) () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#2  0x9f0ad252 in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>~"\nThread 14 (Thread 30114.30142):\n"
>~"#0  0xb6d342cc in __epoll_pwait () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb6d04fbc in epoll_pwait () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#2  0xb6d04fd6 in epoll_wait () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#3  0xb6f7c356 in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>~"\nThread 13 (Thread 30114.30143):\n"
>~"#0  sgesdd (jobz=..., m=3, n=3, a=..., lda=3, s=..., u=..., ldu=3, vt=..., ldvt=3, work=..., lwork=180, iwork=..., info=0, _jobz=0) at sgesdd.f:216\n"
>~"#1  0xa033a268 in LAPACKE_sgesdd_work (matrix_layout=, jobz=0 '\\000', m=0, n=-1629229824, a=0x9ee3ee04, lda=3, s=0x9ee3edf8, u=0x9ee3ee28, ldu=3, vt=0x9ee3ee4c, ldvt=1065353216, work=0x9ee3eef0, lwork=180, iwork=0x9ee3ee70) at lapacke_sgesdd_work.c:84\n"
>~"#2  0xa0332f00 in LAPACKE_sgesdd_work_wrapper (layout=102, jobz=65 'A', m=3, n=3, A=0x9ee3ee04, lda=3, S=0x9ee3edf8, U=0x9ee3ee28, ldu=3, vt=0x9ee3ee4c, ldvt=3, work=0x9ee3eef0, lwork=180, iwork=0x9ee3ee70) at ..\\openblas_wrapper/lapacke_wrapper.h:270\n"
>~"#3  0xa0333200 in cblas_issue_894 () at ..\\openblas_wrapper\\main.cpp:107\n"
>~"#4  0xa0333000 in main (argc=1, argv=0x9ee3f4f8) at ..\\openblas_wrapper\\main.cpp:53\n"
>~"#5  0xb36cc5e0 in startMainMethod(void*) () from C:\\Qt\\Qt5.6.0\\5.6\\android_armv7\\plugins\\platforms\\android\\libqtforandroid.so\n"
>~"#6  0xb6d32ea6 in __pthread_start(void*) () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#7  0xb6d04c36 in __start_thread () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#8  0x00000000 in ?? ()\n"
>~"\nThread 11 (Thread 30114.30131):\n"
>~"#0  0xb6d3440c in __ioctl () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb6d3c4d4 in ioctl () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#2  0xb6f46d88 in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>~"\nThread 10 (Thread 30114.30129):\n"
>~"#0  0xb6d35da8 in wait4 () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb066a686 in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>~"\nThread 9 (Thread 30114.30126):\n"
>~"#0  0xb6d3440c in __ioctl () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb6d3c4d4 in ioctl () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#2  0xb6f46d88 in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>~"\nThread 8 (Thread 30114.30125):\n"
>~"#0  0xb6d3440c in __ioctl () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb6d3c4d4 in ioctl () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#2  0xb6f46d88 in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>~"\nThread 7 (Thread 30114.30124):\n"
>~"#0  0xb6d02610 in syscall () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb494fa0c in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>~"\nThread 6 (Thread 30114.30123):\n"
>~"#0  0xb6d02610 in syscall () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb494fa0c in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>~"\nThread 5 (Thread 30114.30122):\n"
>~"#0  0xb6d02610 in syscall () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb494fa0c in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>~"\nThread 4 (Thread 30114.30121):\n"
>~"#0  0xb6d02610 in syscall () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb494fa0c in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>~"\nThread 3 (Thread 30114.30120):\n"
>~"#0  0xb6d354f4 in recvmsg () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb4c5a022 in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>~"\nThread 2 (Thread 30114.30119):\n"
>~"#0  0xb6d345e0 in __rt_sigtimedwait () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb6d0844a in sigwait () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#2  0xb4b935e8 in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>~"\nThread 1 (Thread 30114.30114):\n"
>~"#0  0xb6d342cc in __epoll_pwait () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#1  0xb6d04fbc in epoll_pwait () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#2  0xb6d04fd6 in epoll_wait () from D:\\MobileApp\\build-openblas_wrapper-Android_f_r_armeabi_v7a_GCC_4_9_Qt_5_6_0-Debug\\libc.so\n"
>~"#3  0xb6f7c356 in ?? ()\n"
>~"Backtrace stopped: previous frame identical to this frame (corrupt stack?)\n"
>2879^done

And sorry for the long block, I don't know how to make a collapsible block.

brada4 commented 8 years ago

Does C kernel work? (ARMv5)? Looks like wrong calling convention or something.

~"#1 0xa033a268 in LAPACKE_sgesdd_work (matrix_layout=, jobz=0 '\000', m=0, n=-1629229824, a=0x9ee3ee04, lda=3, s=0x9ee3edf8, u=0x9ee3ee28, ldu=3, vt=0x9ee3ee4c, ldvt=1065353216, work=0x9ee3eef0, lwork=180, iwork=0x9ee3ee70) at lapacke_sgesdd_work.c:84\n" LWORK-IWORK=128(DEC)<<180

Mikaso commented 8 years ago

The calling convention might be the right point to look at. Since I'm using QtCreator to build an executable for my phone and it's prebuilt binaries are built with soft or softfp float abi I decided to try out erlv's approach to wrapp the openblas function calls, see #853 , because currently, for ARMV7, only hard float abi is supported, as far as I know. In my specific case, for this method it is:

// LAPACKE_sgesdd_work ------
int LAPACKE_sgesdd_work_wrapper( int layout, char jobz, int m, int n,
        float* A, int lda, float* S, float* U, int ldu, float* vt, int ldvt,
        float* work, int lwork, int* iwork) {
#ifdef __ARM_PCS_VFP
  // if compiled by -mfloat-abi=hard, directly call LAPACKE_sgesdd_work
  return LAPACKE_sgesdd_work(layout, jobz, m, n, A, lda, S, U, ldu, vt, ldvt, work, lwork, iwork);
#else
#ifdef __SOFTFP
#error "ERROR:please build use softfp or hard ABI\n"
#else
  // if compiled by -mfloat-abi=softfp, run the assembly to prepare for the hardfp ABI call.
  register int val asm("r0") = 1;
  __asm__ __volatile__("sub sp, sp, #72 \n\t"
                       "mov r0, %0 \n\t"            // layout
                       "mov r1, %1 \n\t"            // jobz
                       "mov r2, %2 \n\t"            // m
                       "mov r3, %3 \n\t"            // n
                       "str %4, [sp] \n\t"          // A
                       "str %5, [sp, #4] \n\t"      // lda
                       "str %6, [sp, #8] \n\t"      // S
                       "str %7, [sp, #12] \n\t"     // U

                       "ldr r12, %8 \n\t"           // ldu
                       "str r12, [sp, #16] \n\t"

                       "ldr r12, %9 \n\t"           // vt
                       "str r12, [sp, #20] \n\t"

                       "ldr r12, %10 \n\t"          // ldvt
                       "str r12, [sp, #24] \n\t"

                       "ldr r12, %11 \n\t"          // work
                       "str r12, [sp, #28] \n\t"

                       "ldr r12, %12 \n\t"          // lwork
                       "str r12, [sp, #32] \n\t"

                       "ldr r12, %13 \n\t"          // iwork
                       "str r12, [sp, #36] \n\t"

                       "bl " LAPACKE_sgesdd_work "(PLT)\n\t"
                       "add sp, sp, #72 \n\t"
                       :
                       : "r"(layout), "r"(jobz), "r"(m), "r"(n),
                         "r"(A), "r"(lda), "r"(S), "r"(U), "g"(ldu),
                         "g"(vt), "g"(ldvt),
                         "g"(work), "g"(lwork), "g"(iwork)
                       : "cc", "memory", "r0", "r1", "r2", "r3", "sp", "r12");
  return val;
#endif
#endif
}
// LAPACKE_sgesdd_work ------

When my wrappers are running, I want to share them to the community.

Is it maybe the wrong runtime library for fortran?

Since there are no plain floats as arguments, it should even work without a wrapper, shouldn't it?

Besides of problems while developing - are there any downsides of such wrappers?

brada4 commented 8 years ago

Actually it is about IDFEF-changing function prologues like e.g. this one: https://github.com/xianyi/OpenBLAS/blob/develop/kernel/arm/dgemm_kernel_4x4_vfpv3.S And performance considerations here: https://wiki.debian.org/ArmHardFloatPort/VfpComparison#A.22softfp.22 (count dgemm parameters and multiply by 20 - for small matrices like DSP code inline C will be faster)