OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.38k stars 1.5k forks source link

Fix build on FreeBSD/powerpc64 #1895

Closed pkubaj closed 5 years ago

pkubaj commented 5 years ago

FreeBSD currently can't build OpenBLAS due to the following error:

../../common_power.h:816:23: error: 'START_ADDRESS' undeclared here (not in a function); did you mean 'BASE_ADDRESS'?
 #define BASE_ADDRESS (START_ADDRESS - BUFFER_SIZE * MAX_CPU_NUMBER)
                       ^

I can see that START_ADDRESS is already defined for Linux and AIX. What is this value used for? Do you know what it should be set to for FreeBSD?

brada4 commented 5 years ago

FreeBSD could build OpenBLAS 1 week ago.

pkubaj commented 5 years ago

FreeBSD could build OpenBLAS 1 week ago.

Did you build on powerpc64?

brada4 commented 5 years ago

Nope, but text in issue claims general problem , not a problem porting to on their Tier2 platform.

pkubaj commented 5 years ago

Nope, but text in issue claims general problem , not a problem porting to on their Tier2 platform.

Please, read the title.

brada4 commented 5 years ago

Is #1894 complete fix for this porting issue? There was (or still is?) a problem with 12-prerelease that ARCH=amd64 gets set by port builder, where LAPACK that is included with OpenBLAS expects ARCH to be ar command.

pkubaj commented 5 years ago

It's not complete. Once #1894 is applied, that's the problem I face.

brada4 commented 5 years ago

What problem? Do you have a build log or something to show?

martin-frbg commented 5 years ago

@pkubaj I have updated your PR with ararslan's suggestion, that should do the trick.

pkubaj commented 5 years ago

My problem is mentioned in the 1st message. Here's the full build log: https://talos.anongoth.pl/data/powerpc64-default/2018-12-03_09h26m07s/logs/openblas-0.3.3,1.log

brada4 commented 5 years ago

It is a problem introduced by poudriere build system and should be addressed in #1899 (targetted for 0.3.5) also for architectures not yet mentioned (it was fixed up for amd64/x86_64 some time ago ) OpenBLAS builds on clean FreeBSD 12 outside poudriere, like typing make in top of out source tree.

pkubaj commented 5 years ago

I already applied that fix. It makes the build system use proper Makefile. It doesn't fix a build, either in Poudriere or straight from source tree (build fails later).

The reason is missing START_ADDRESS macro.

martin-frbg commented 5 years ago

START_ADDRESS (or alternatively SEEK_ADDRESS) gets defined in common_power.h, but only for LINUX and AIX currently. Could you try changing the "ifdef OS_AIX" in line 789 of the file to #if defined(OS_AIX) || defined OS_FREEBSD) to see if this is the only remaining problem ?

pkubaj commented 5 years ago

EDIT: After adding FreeBSD to ifdef in kernel/power/axpy.S it works, but fails at another assembly. I'll try to add FreeBSD it ifdefs and see whether it compiles.

It fixes this specific error. The reason I created this issue was because I wasn't sure whether this value is correct for FreeBSD (I'm not sure what it's responsible for).

After adding it, I have a lot of errors like:

Assembler messages:
../kernel/power/min.S:52: Error: unrecognized opcode: `prologue'
../kernel/power/min.S:53: Error: unrecognized opcode: `profcode'
../kernel/power/min.S:445: Error: unrecognized opcode: `epilogue'
../kernel/power/max.S: Assembler messages:
../kernel/power/max.S:52: Error: unrecognized opcode: `prologue'
../kernel/power/max.S:53: Error: unrecognized opcode: `profcode'
../kernel/power/max.S:445: Error: unrecognized opcode: `epilogue'
../kernel/power/axpy.S: Assembler messages:
../kernel/power/axpy.S:88: Error: unrecognized opcode: `prologue'
../kernel/power/axpy.S:89: Error: unrecognized opcode: `profcode'
../kernel/power/axpy.S:549: Error: unrecognized opcode: `epilogue'
../kernel/power/axpy.S:113: Error: unsupported relocation against INCX
../kernel/power/axpy.S:113: Error: unsupported relocation against INCX
../kernel/power/axpy.S:114: Error: unsupported relocation against INCY
../kernel/power/axpy.S:114: Error: unsupported relocation against INCY
../kernel/power/axpy.S:117: Error: unsupported relocation against PREA
../kernel/power/axpy.S:122: Error: unsupported relocation against N
../kernel/power/axpy.S:125: Error: unsupported relocation against INCX
../kernel/power/axpy.S:127: Error: unsupported relocation against INCY
../kernel/power/axpy.S:130: Error: unsupported relocation against N
../kernel/power/axpy.S:135: Error: unsupported relocation against X
../kernel/power/axpy.S:136: Error: unsupported relocation against X
../kernel/power/axpy.S:137: Error: unsupported relocation against X
../kernel/power/axpy.S:138: Error: unsupported relocation against X
../kernel/power/axpy.S:140: Error: unsupported relocation against Y
../kernel/power/axpy.S:141: Error: unsupported relocation against Y
../kernel/power/axpy.S:142: Error: unsupported relocation against Y
../kernel/power/axpy.S:143: Error: unsupported relocation against Y
../kernel/power/axpy.S:145: Error: unsupported relocation against X
../kernel/power/axpy.S:146: Error: unsupported relocation against X
../kernel/power/axpy.S:147: Error: unsupported relocation against X
../kernel/power/axpy.S:148: Error: unsupported relocation against X
../kernel/power/axpy.S:150: Error: unsupported relocation against Y
../kernel/power/axpy.S:151: Error: unsupported relocation against Y
../kernel/power/axpy.S:152: Error: unsupported relocation against Y
../kernel/power/axpy.S:153: Error: unsupported relocation against Y
../kernel/power/axpy.S:163: Error: unsupported relocation against X
../kernel/power/axpy.S:164: Error: unsupported relocation against X
../kernel/power/axpy.S:165: Error: unsupported relocation against X
../kernel/power/axpy.S:166: Error: unsupported relocation against X
../kernel/power/axpy.S:168: Error: unsupported relocation against Y
../kernel/power/axpy.S:169: Error: unsupported relocation against Y
../kernel/power/axpy.S:170: Error: unsupported relocation against Y
../kernel/power/axpy.S:171: Error: unsupported relocation against Y
../kernel/power/axpy.S:173: Error: unsupported relocation against Y
../kernel/power/axpy.S:174: Error: unsupported relocation against Y
../kernel/power/axpy.S:175: Error: unsupported relocation against Y
../kernel/power/axpy.S:176: Error: unsupported relocation against Y
../kernel/power/axpy.S:183: Error: unsupported relocation against X
../kernel/power/axpy.S:184: Error: unsupported relocation against X
../kernel/power/axpy.S:185: Error: unsupported relocation against X
../kernel/power/axpy.S:186: Error: unsupported relocation against X
../kernel/power/axpy.S:188: Error: unsupported relocation against Y
../kernel/power/axpy.S:189: Error: unsupported relocation against Y
../kernel/power/axpy.S:190: Error: unsupported relocation against Y
../kernel/power/axpy.S:191: Error: unsupported relocation against Y
../kernel/power/axpy.S:193: Error: unsupported relocation against Y
../kernel/power/axpy.S:194: Error: unsupported relocation against Y
../kernel/power/axpy.S:195: Error: unsupported relocation against Y
../kernel/power/axpy.S:196: Error: unsupported relocation against Y
../kernel/power/axpy.S:203: Error: unsupported relocation against X
../kernel/power/axpy.S:204: Error: unsupported relocation against X
../kernel/power/axpy.S:205: Error: unsupported relocation against X
../kernel/power/axpy.S:206: Error: unsupported relocation against X
../kernel/power/axpy.S:208: Error: unsupported relocation against Y
../kernel/power/axpy.S:209: Error: unsupported relocation against Y
../kernel/power/axpy.S:210: Error: unsupported relocation against Y
../kernel/power/axpy.S:211: Error: unsupported relocation against Y
../kernel/power/axpy.S:213: Error: unsupported relocation against Y
../kernel/power/axpy.S:214: Error: unsupported relocation against Y
../kernel/power/axpy.S:215: Error: unsupported relocation against Y
../kernel/power/axpy.S:216: Error: unsupported relocation against Y
../kernel/power/axpy.S:223: Error: unsupported relocation against X
../kernel/power/axpy.S:224: Error: unsupported relocation against X
../kernel/power/axpy.S:225: Error: unsupported relocation against X
../kernel/power/axpy.S:226: Error: unsupported relocation against X
../kernel/power/axpy.S:228: Error: unsupported relocation against Y
../kernel/power/axpy.S:229: Error: unsupported relocation against Y
../kernel/power/axpy.S:230: Error: unsupported relocation against Y
../kernel/power/axpy.S:231: Error: unsupported relocation against Y
../kernel/power/axpy.S:233: Error: unsupported relocation against Y
../kernel/power/axpy.S:234: Error: unsupported relocation against Y
../kernel/power/axpy.S:235: Error: unsupported relocation against Y
../kernel/power/axpy.S:236: Error: unsupported relocation against Y
../kernel/power/axpy.S:239: Error: unsupported relocation against Y
../kernel/power/axpy.S:239: Error: unsupported relocation against PREA
../kernel/power/axpy.S:241: Error: unsupported relocation against X
../kernel/power/axpy.S:241: Error: unsupported relocation against PREA
../kernel/power/axpy.S:244: Error: unsupported relocation against X
../kernel/power/axpy.S:244: Error: unsupported relocation against X
../kernel/power/axpy.S:245: Error: unsupported relocation against Y
../kernel/power/axpy.S:245: Error: unsupported relocation against Y
../kernel/power/axpy.S:261: Error: unsupported relocation against X
../kernel/power/axpy.S:262: Error: unsupported relocation against X
../kernel/power/axpy.S:263: Error: unsupported relocation against X
../kernel/power/axpy.S:264: Error: unsupported relocation against X
../kernel/power/axpy.S:266: Error: unsupported relocation against Y
../kernel/power/axpy.S:267: Error: unsupported relocation against Y
../kernel/power/axpy.S:268: Error: unsupported relocation against Y
../kernel/power/axpy.S:269: Error: unsupported relocation against Y
../kernel/power/axpy.S:276: Error: unsupported relocation against X
../kernel/power/axpy.S:277: Error: unsupported relocation against X
../kernel/power/axpy.S:278: Error: unsupported relocation against X
../kernel/power/axpy.S:279: Error: unsupported relocation against X
../kernel/power/axpy.S:281: Error: unsupported relocation against Y
../kernel/power/axpy.S:282: Error: unsupported relocation against Y
../kernel/power/axpy.S:283: Error: unsupported relocation against Y
../kernel/power/axpy.S:284: Error: unsupported relocation against Y
../kernel/power/axpy.S:286: Error: unsupported relocation against Y
../kernel/power/axpy.S:287: Error: unsupported relocation against Y
../kernel/power/axpy.S:288: Error: unsupported relocation against Y
../kernel/power/axpy.S:289: Error: unsupported relocation against Y
../kernel/power/axpy.S:296: Error: unsupported relocation against Y
../kernel/power/axpy.S:297: Error: unsupported relocation against Y
../kernel/power/axpy.S:298: Error: unsupported relocation against Y
../kernel/power/axpy.S:299: Error: unsupported relocation against Y
../kernel/power/axpy.S:306: Error: unsupported relocation against Y
../kernel/power/axpy.S:307: Error: unsupported relocation against Y
../kernel/power/axpy.S:308: Error: unsupported relocation against Y
../kernel/power/axpy.S:309: Error: unsupported relocation against Y
../kernel/power/axpy.S:311: Error: unsupported relocation against Y
../kernel/power/axpy.S:312: Error: unsupported relocation against Y
../kernel/power/axpy.S:313: Error: unsupported relocation against Y
../kernel/power/axpy.S:314: Error: unsupported relocation against Y
../kernel/power/axpy.S:316: Error: unsupported relocation against X
../kernel/power/axpy.S:316: Error: unsupported relocation against X
../kernel/power/axpy.S:317: Error: unsupported relocation against Y
../kernel/power/axpy.S:317: Error: unsupported relocation against Y
../kernel/power/axpy.S:321: Error: unsupported relocation against N
../kernel/power/axpy.S:327: Error: unsupported relocation against X
../kernel/power/axpy.S:328: Error: unsupported relocation against Y
../kernel/power/axpy.S:332: Error: unsupported relocation against Y
../kernel/power/axpy.S:333: Error: unsupported relocation against X
../kernel/power/axpy.S:333: Error: unsupported relocation against X
../kernel/power/axpy.S:334: Error: unsupported relocation against Y
../kernel/power/axpy.S:334: Error: unsupported relocation against Y
../kernel/power/axpy.S:340: Error: unsupported relocation against X
../kernel/power/axpy.S:340: Error: unsupported relocation against X
../kernel/power/axpy.S:340: Error: unsupported relocation against INCX
../kernel/power/axpy.S:341: Error: unsupported relocation against Y
../kernel/power/axpy.S:341: Error: unsupported relocation against Y
../kernel/power/axpy.S:341: Error: unsupported relocation against INCY
../kernel/power/axpy.S:342: Error: unsupported relocation against YY
../kernel/power/axpy.S:342: Error: unsupported relocation against Y
../kernel/power/axpy.S:344: Error: unsupported relocation against N
../kernel/power/axpy.S:349: Error: invalid register operand when updating
../kernel/power/axpy.S:349: Error: unsupported relocation against X
../kernel/power/axpy.S:349: Error: unsupported relocation against INCX
../kernel/power/axpy.S:350: Error: invalid register operand when updating
../kernel/power/axpy.S:350: Error: unsupported relocation against X
../kernel/power/axpy.S:350: Error: unsupported relocation against INCX
../kernel/power/axpy.S:351: Error: invalid register operand when updating
../kernel/power/axpy.S:351: Error: unsupported relocation against X
../kernel/power/axpy.S:351: Error: unsupported relocation against INCX
../kernel/power/axpy.S:352: Error: invalid register operand when updating
../kernel/power/axpy.S:352: Error: unsupported relocation against X
../kernel/power/axpy.S:352: Error: unsupported relocation against INCX
../kernel/power/axpy.S:354: Error: invalid register operand when updating
../kernel/power/axpy.S:354: Error: unsupported relocation against Y
../kernel/power/axpy.S:354: Error: unsupported relocation against INCY
../kernel/power/axpy.S:355: Error: invalid register operand when updating
../kernel/power/axpy.S:355: Error: unsupported relocation against Y
../kernel/power/axpy.S:355: Error: unsupported relocation against INCY
../kernel/power/axpy.S:356: Error: invalid register operand when updating
../kernel/power/axpy.S:356: Error: unsupported relocation against Y
../kernel/power/axpy.S:356: Error: unsupported relocation against INCY
../kernel/power/axpy.S:357: Error: invalid register operand when updating
../kernel/power/axpy.S:357: Error: unsupported relocation against Y
../kernel/power/axpy.S:357: Error: unsupported relocation against INCY
../kernel/power/axpy.S:359: Error: invalid register operand when updating
../kernel/power/axpy.S:359: Error: unsupported relocation against X
../kernel/power/axpy.S:359: Error: unsupported relocation against INCX
../kernel/power/axpy.S:360: Error: invalid register operand when updating
../kernel/power/axpy.S:360: Error: unsupported relocation against X
../kernel/power/axpy.S:360: Error: unsupported relocation against INCX
../kernel/power/axpy.S:361: Error: invalid register operand when updating
../kernel/power/axpy.S:361: Error: unsupported relocation against X
../kernel/power/axpy.S:361: Error: unsupported relocation against INCX
../kernel/power/axpy.S:362: Error: invalid register operand when updating
../kernel/power/axpy.S:362: Error: unsupported relocation against X
../kernel/power/axpy.S:362: Error: unsupported relocation against INCX
../kernel/power/axpy.S:364: Error: invalid register operand when updating
../kernel/power/axpy.S:364: Error: unsupported relocation against Y
../kernel/power/axpy.S:364: Error: unsupported relocation against INCY
../kernel/power/axpy.S:365: Error: invalid register operand when updating
../kernel/power/axpy.S:365: Error: unsupported relocation against Y
../kernel/power/axpy.S:365: Error: unsupported relocation against INCY
../kernel/power/axpy.S:366: Error: invalid register operand when updating
../kernel/power/axpy.S:366: Error: unsupported relocation against Y
../kernel/power/axpy.S:366: Error: unsupported relocation against INCY
../kernel/power/axpy.S:367: Error: invalid register operand when updating
../kernel/power/axpy.S:367: Error: unsupported relocation against Y
../kernel/power/axpy.S:367: Error: unsupported relocation against INCY
../kernel/power/axpy.S:377: Error: invalid register operand when updating
../kernel/power/axpy.S:377: Error: unsupported relocation against X
../kernel/power/axpy.S:377: Error: unsupported relocation against INCX
../kernel/power/axpy.S:378: Error: invalid register operand when updating
../kernel/power/axpy.S:378: Error: unsupported relocation against X
../kernel/power/axpy.S:378: Error: unsupported relocation against INCX
../kernel/power/axpy.S:379: Error: invalid register operand when updating
../kernel/power/axpy.S:379: Error: unsupported relocation against X
../kernel/power/axpy.S:379: Error: unsupported relocation against INCX
../kernel/power/axpy.S:380: Error: invalid register operand when updating
../kernel/power/axpy.S:380: Error: unsupported relocation against X
../kernel/power/axpy.S:380: Error: unsupported relocation against INCX
../kernel/power/axpy.S:382: Error: invalid register operand when updating
../kernel/power/axpy.S:382: Error: unsupported relocation against Y
../kernel/power/axpy.S:382: Error: unsupported relocation against INCY
../kernel/power/axpy.S:383: Error: invalid register operand when updating
../kernel/power/axpy.S:383: Error: unsupported relocation against Y
../kernel/power/axpy.S:383: Error: unsupported relocation against INCY
../kernel/power/axpy.S:384: Error: invalid register operand when updating
../kernel/power/axpy.S:384: Error: unsupported relocation against Y
../kernel/power/axpy.S:384: Error: unsupported relocation against INCY
../kernel/power/axpy.S:385: Error: invalid register operand when updating
../kernel/power/axpy.S:385: Error: unsupported relocation against Y
../kernel/power/axpy.S:385: Error: unsupported relocation against INCY
../kernel/power/axpy.S:392: Error: invalid register operand when updating
../kernel/power/axpy.S:392: Error: unsupported relocation against X
../kernel/power/axpy.S:398392: Error: unsupported relocation against INCX
../kernel/power/axpy.S:393: Error: invalid register operand when updating
../kernel/power/axpy.S:393: Error: unsupported relocation against X
../kernel/power/axpy.S:393: Error: unsupported relocation against INCX
../kernel/power/axpy.S:394: Error: invalid register operand when updating
../kernel/power/axpy.S:394: Error: unsupported relocation against X
../kernel/power/axpy.S:394: Error: unsupported relocation against INCX
../kernel/power/axpy.S:395: Error: invalid register operand when updating
../kernel/power/axpy.S:395: Error: unsupported relocation against X
../kernel/power/axpy.S:395: Error: unsupported relocation against INCX
../kernel/power/axpy.S:397: Error: invalid register operand when updating
../kernel/power/axpy.S:397: Error: unsupported relocation against Y
../kernel/power/axpy.S:397: Error: unsupported relocation against INCY
../kernel/power/axpy.S:398: Error: invalid register operand when updating

But I don't know whether it's because START_ADDRESS is bad, or it's another issue.

Since this is assembly code, my hardware may matter. I use Talos II board with POWER9 CPU.

martin-frbg commented 5 years ago

I almost expected that - if you look in common_power.h, there are also Linux- and AIX-specific sections that define PROLOGUE and EPILOGUE. Luckily I see now that there are already entries for OS_DARWIN as well (both for the SEEK_ADDRESS business and the PROLOGUE/EPILOGUE thing). As OSX by all accounts derives from BSD, perhaps it will be sufficient to tack on an || defined(OS_FREEBSD) to each and every ifdef mentioning OS_DARWIN ?

pkubaj commented 5 years ago

Ok, I'll do that and see whether it compiles.

pkubaj commented 5 years ago

I get errors with PROFCODE, PROLOGUE and EPILOGUE:

../kernel/power/iamax.S: Assembler messages:
../kernel/power/iamax.S:55: Error: unrecognized opcode: `prologue'
../kernel/power/iamax.S:56: Error: unrecognized opcode: `profcode'
<command-line>:0:0: note: this is the location of the previous definition
../kernel/power/max.S: Assembler messages:
../kernel/power/max.S:52: Error: unrecognized opcode: `prologue'
../kernel/power/max.S:53: Error: unrecognized opcode: `profcode'
<command-line>:0:0: warning: "CHAR_CNAME" redefined
<command-line>:0:0: note: this is the location of the previous definition
../kernel/power/max.S:445: Error: unrecognized opcode: `epilogue'
../kernel/power/iamax.S:802: Error: unrecognized opcode: `epilogue'

Unfortunately, after adding OS_FREEBSD to ifdef OS_AIX for the code block that defines those in common_power.h, I get:

../kernel/power/axpy.S: Assembler messages:
../kernel/power/axpy.S:88: Error: unknown pseudo-op: `.csect'
../kernel/power/axpy.S:549: Error: unknown pseudo-op: `.csect'
gmake[3]: *** [Makefile.L1:581: isamin_k.o] Error 1
../kernel/power/gemv_t.S: Assembler messages:
../kernel/power/amax.S: ../kernel/power/gemv_t.S:197: Error: unknown pseudo-op: `.csect'
Assembler messages:
../kernel/power/amax.S:52: Error: unknown pseudo-op: `.csect'
../kernel/power/gemv_t.S:287: Error: junk at end of line: `+288(1)'
../kernel/power/gemv_t.S:288: Error: junk at end of line: `+288(1)'
../kernel/power/gemv_t.S:289: Error: junk at end of line: `+288(1)'
../kernel/power/amax.S:523: Error: unknown pseudo-op: `.csect'
../kernel/power/gemv_t.S:2967: Error: unknown pseudo-op: `.csect'
gmake[3]: *** [Makefile.L1:561: isamax_k.o] Error 1
../kernel/power/iamin.S: Assembler messages:
../kernel/power/iamin.S:55: Error: unknown pseudo-op: `.csect'
../kernel/power/iamin.S:803: Error: unknown pseudo-op: `.csect'
gmake[3]: *** [Makefile.L1:498: samax_k.o] Error 1
gmake[3]: *** [Makefile.L2:226: sgemv_t.o] Error 1
gmake[3]: *** [Makefile.L1:640: saxpy_k.o] Error 1
gmake[3]: *** [Makefile.L1:601: ismax_k.o] Error 1
gmake[3]: *** [Makefile.L1:612: ismin_k.o] Error 1
../kernel/power/gemv_n.S: Assembler messages:
../kernel/power/gemv_n.S:195: Error: unknown pseudo-op: `.csect'
../kernel/power/gemv_n.S:279: Error: junk at end of line: `+280(1)'
../kernel/power/gemv_n.S:280: Error: junk at end of line: `+280(1)'
../kernel/power/gemv_n.S:281: Error: junk at end of line: `+280(1)'
../kernel/power/gemv_n.S:3095: Error: unknown pseudo-op: `.csect'
martin-frbg commented 5 years ago

Err, what happens when you add OS_FREEBSD to the sections that mention OS_DARWIN instead (and remove it from all sections OS_AIX that we tried at first) ?

pkubaj commented 5 years ago
../common_power.h: Assembler messages:
../common_power.h:640: Error: unexpected end of file in macro `prologue' definition
martin-frbg commented 5 years ago

Could be a mis-edit ? line 640 would put it after the .endmacro of the PROLOGUE though... unfortunately I have no idea what this look like for FreeBSD on Power, do you happen to know some other package that uses PPC assembly and already builds on FreeBSD ?

brada4 commented 5 years ago

Which as you are using? Should be one from GNU binutils to make gcc happy.

brada4 commented 5 years ago

@pkubaj can you try building with other compilers (w NO_LAPACK=1 added, we have problem in BLAS part)

It looks like wrong as (not compatible with that used on AIX and Linux gcc) is getting called. Sure best luck outcome would be one combination that works....

pkubaj commented 5 years ago

Could be a mis-edit ? line 640 would put it after the .endmacro of the PROLOGUE though... unfortunately I have no idea what this look like for FreeBSD on Power, do you happen to know some other package that uses PPC assembly and already builds on FreeBSD ?

Line 640 is the beginning of .macro PROLOGUE. I'm unfortunately not sure if there's anything that uses POWER assembly.

Which as you are using? Should be one from GNU binutils to make gcc happy.

root@talos:$/usr/ports/math/openblas$ make -V AS
/usr/local/bin/as
root@talos:$/usr/ports/math/openblas$ /usr/local/bin/as --version
GNU assembler (GNU Binutils) 2.30
Copyright (C) 2018 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `powerpc64-portbld-freebsd12.0'.

@pkubaj can you try building with other compilers (w NO_LAPACK=1 added, we have problem in BLAS part)

* system compler (clang)

* same with `CC="clang -fintegrated-as"` and `CC="clang -fno-integrated-as"` 

* assuring `as` or `gas` from gnu binutils port precedes any other `as` commands in $PATH, maybe symlinking gas to as

It looks like wrong as (not compatible with that used on AIX and Linux gcc) is getting called. Sure best luck outcome would be one combination that works....

System compiler for powerpc64 is GCC 4.2, it's probably not supported by OpenBLAS.

Definitely GNU as is used (unless you overwrite AS variable).

martin-frbg commented 5 years ago

I'm obviously out of my depth here. Guess that would leave trying the OS_LINUX version of the prologue/epilogue definitions.

brada4 commented 5 years ago

gcc 4.2 should be fine (think gcc 4.1 in CentOS5), though in absence of matching old gfortran it will not be able to make complete library. It will have identical problem with as I suspect, so it is best to try Martin's suggestion with compiler aligned with existing x86(_64) builds.

martin-frbg commented 5 years ago

@pkubaj any luck with the OS_LINUX version of the PROLOGUE ?

pkubaj commented 5 years ago

Trying the same compiler as what is used for x86 is out of question, because of LLVM inability to generate correct code on FreeBSD/powerpc64 platform.

When using OS_LINUX version, I get:

../kernel/power/strmm_kernel_16x8_power8.S: Assembler messages:
../kernel/power/strmm_kernel_16x8_power8.S:271: Error: unsupported relocation against LDC
../kernel/power/strmm_kernel_16x8_power8.S:271: Error: unsupported relocation against LDC
../kernel/power/strmm_kernel_16x8_power8.S:291: Error: unsupported relocation against OFFSET
../kernel/power/strmm_logic_16x8_power8.S:41: Error: unsupported relocation against C
../kernel/power/strmm_logic_16x8_power8.S:42: Error: unsupported relocation against A
../kernel/power/strmm_logic_16x8_power8.S:43: Error: unsupported relocation against LDC
../kernel/power/strmm_logic_16x8_power8.S:44: Error: unsupported relocation against C
../kernel/power/strmm_logic_16x8_power8.S:44: Error: unsupported relocation against C
../kernel/power/strmm_logic_16x8_power8.S:47: Error: unsupported relocation against OFFSET
../kernel/power/strmm_logic_16x8_power8.S:59: Error: unsupported relocation against B
../kernel/power/strmm_logic_16x8_power8.S:197: Error: unsupported relocation against LDC
../kernel/power/strmm_logic_16x8_power8.S:197: Error: unsupported relocation against LDC
../kernel/power/strmm_logic_16x8_power8.S:197: Error: unsupported relocation against LDC
../kernel/power/strmm_logic_16x8_power8.S:197: Error: unsupported relocation against LDC
../kernel/power/strmm_logic_16x8_power8.S:197: Error: unsupported relocation against LDC
../kernel/power/strmm_logic_16x8_power8.S:197: Error: unsupported relocation against LDC
../kernel/power/strmm_logic_16x8_power8.S:197: Error: unsupported relocation against LDC
martin-frbg commented 5 years ago

So the good news seems to be that the Linux versions of PROLOGUE/EPILOGUE in common_power.h are acceptable for FreeBSD ? Are the "unsupported relocation" errors only a sample of the compile failures you are now facing, or is it just those two strmmkernel... files that generate errors ? (From what I can find out about this assembler message, "unsupported relocation against" is basically "I do not know a register named..." - and again the affected files have various "if defined(linux) #define LDC r7" etc. at the top.)

pkubaj commented 5 years ago

This is only a sample, but all errors are about unsupported relocations in ../kernel/power/sgemm_logic_16x8_power8.S, ../kernel/power/sgemm_kernel_16x8_power8.S, ../kernel/power/strmm_logic_16x8_power8.S and ../kernel/power/strmm_kernel_16x8_power8.S.

I added defined(FreeBSD) to ifdef linux in kernel/power/sgemm_kernel_16x8_power8.S for:

     99 #ifndef __64BIT__
    100 #define A       r6
    101 #define B       r7
    102 #define C       r8
    103 #define LDC     r9
    104 #define OFFSET  r10
    105 #else
    106 #define A       r7
    107 #define B       r8
    108 #define C       r9
    109 #define LDC     r10
    110 #define OFFSET  r6
    111 #endif
    112 #endif

And same to kernel/power/strmm_kernel_16x8_power8.S (I know it may not sound clear, but once everything builds, I'll send a PR).

It seems that it helps with building, but then build crashes on another assembly files. Adding FreeBSD to ifdef linux seems to help.

I'll look more into it tomorrow.

martin-frbg commented 5 years ago

Great, thanks for the feedback.

pkubaj commented 5 years ago

After patching all assembly files, I'm getting a segfault in test/sblat2.

#0  .sgemv_n () at ../kernel/power/gemv_n.S:317
317             STFD    f0, 0 * SIZE(Y1)
brada4 commented 5 years ago

@pkubaj PRO/EPI -logue is meant to save/restore registers that ABI considers immutable during normal C library call. AIX and Linux wrappers did not work out. @martin-frbg what do you think about making generic/KERNEL.CC out of arm/KERNEL.ARMv5 ?

martin-frbg commented 5 years ago

@brada4 I am not sure that "Linux wrappers did not work out", at least it seems it did not crash in the very first test. Surely ABI differences must be documented somewhere, so if e.g. f0 needs to be saved in the prologue it should be possible to adjust it accordingly.

brada4 commented 5 years ago

It is being experimented around still recently: https://lists.freebsd.org/pipermail/freebsd-ppc/2017-May/008858.html

pkubaj commented 5 years ago

I ran sblat2 manually, I got:

root@talos:$/usr/ports/math/openblas/work/OpenBLAS-0.3.4/test$ ./sblat2 < sblat2.dat

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Could not print backtrace: libbacktrace could not find executable to open

It is being experimented around still recently: https://lists.freebsd.org/pipermail/freebsd-ppc/2017-May/008858.html

It was done using LLVM 4.0. LLVM supports FreeBSD/powerpc64 properly only since 8.0 (current devel branch).

martin-frbg commented 5 years ago

Is sblat2 the first (and only) test you ran, or the first one that failed while running make test ?

pkubaj commented 5 years ago

No, some other tests run:

OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./sblat1
zblat2.f:2057:0:

      $                   NARGS, NC, NS

Warning: 'nargs' may be used uninitialized in this function [-Wmaybe-uninitialized]
 Real BLAS Test Program Results

 Test of subprogram number  1             SDOT
                                    ----- PASS -----

 Test of subprogram number  2            SAXPY
                                    ----- PASS -----

 Test of subprogram number  3            SROTG
                                    ----- PASS -----

 Test of subprogram number  4             SROT
                                    ----- PASS -----

 Test of subprogram number  5            SCOPY
                                    ----- PASS -----

 Test of subprogram number  6            SSWAP
                                    ----- PASS -----

 Test of subprogram number  7            SNRM2
                                    ----- PASS -----

 Test of subprogram number  8            SASUM
                                    ----- PASS -----

 Test of subprogram number  9            SSCAL
                                    ----- PASS -----

 Test of subprogram number 10            ISAMAX
                                    ----- PASS -----

 Test of subprogram number 11            SROTMG
                                    ----- PASS -----

 Test of subprogram number 12            SROTM
                                    ----- PASS -----

 Test of subprogram number 13            SDSDOT
                                    ----- PASS -----
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./dblat1
 Real BLAS Test Program Results

 Test of subprogram number  1             DDOT
                                       FAIL

 CASE  N INCX INCY  I                             COMP(I)                             TRUE(I)  DIFFERENCE     SIZE(I)

    1  1    1    1  1                      0.00000000D+00                      0.30000000D+00 -0.3000D+00  0.3000D+00
    1  2    1    1  1                      0.00000000D+00                      0.21000000D+00 -0.2100D+00  0.1600D+01
    1  4    1    1  1                      0.56000000D+00                      0.62000000D+00 -0.6000D-01  0.3200D+01
    1  1    2   -2  1                      0.00000000D+00                      0.30000000D+00 -0.3000D+00  0.3000D+00
    1  2    2   -2  1                      0.00000000D+00                     -0.70000000D-01  0.7000D-01  0.1600D+01
    1  4    2   -2  1                      0.57000000D+00                      0.85000000D+00 -0.2800D+00  0.3200D+01
    1  1   -2    1  1                      0.00000000D+00                      0.30000000D+00 -0.3000D+00  0.3000D+00
    1  2   -2    1  1                      0.00000000D+00                     -0.79000000D+00  0.7900D+00  0.1600D+01
    1  4   -2    1  1                     -0.96000000D+00                     -0.74000000D+00 -0.2200D+00  0.3200D+01
    1  1   -1   -2  1                      0.00000000D+00                      0.30000000D+00 -0.3000D+00  0.3000D+00
    1  2   -1   -2  1                      0.30000000D-01                      0.33000000D+00 -0.3000D+00  0.1600D+01
    1  4   -1   -2  1                      0.97000000D+00                      0.12700000D+01 -0.3000D+00  0.3200D+01

 Test of subprogram number  2            DAXPY
                                    ----- PASS -----

 Test of subprogram number  3            DROTG
                                    ----- PASS -----

 Test of subprogram number  4             DROT
                                    ----- PASS -----

 Test of subprogram number  5            DCOPY
                                    ----- PASS -----

 Test of subprogram number  6            DSWAP
                                    ----- PASS -----

 Test of subprogram number  7            DNRM2
                                    ----- PASS -----

 Test of subprogram number  8            DASUM
                                    ----- PASS -----

 Test of subprogram number  9            DSCAL
                                    ----- PASS -----

 Test of subprogram number 10            IDAMAX
                                    ----- PASS -----

 Test of subprogram number 11            DROTMG
                                    ----- PASS -----

 Test of subprogram number 12            DROTM
                                    ----- PASS -----

 Test of subprogram number 13            DSDOT
                                    ----- PASS -----
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./cblat1
 Complex BLAS Test Program Results

 Test of subprogram number  1            CDOTC
                                       FAIL

 CASE  N INCX INCY MODE  I                             COMP(I)                             TRUE(I)  DIFFERENCE     SIZE(I)

    1  1    1    1 9999  1                      0.00000000E+00                      0.89999998E+00 -0.9000E+00  0.9000E+00
    1  1    1    1 9999  2                      0.00000000E+00                      0.59999999E-01 -0.6000E-01  0.9000E+00
    1  2    1    1 9999  1                      0.00000000E+00                      0.91000003E+00 -0.9100E+00  0.1630E+01
    1  2    1    1 9999  2                      0.00000000E+00                     -0.76999998E+00  0.7700E+00  0.1730E+01
    1  4    1    1 9999  1                      0.42000002E+00                      0.18000000E+01 -0.1380E+01  0.2900E+01
    1  4    1    1 9999  2                     -0.19999996E-01                     -0.10000000E+00  0.8000E-01  0.2780E+01
    1  1    2   -2 9999  1                      0.00000000E+00                      0.89999998E+00 -0.9000E+00  0.9000E+00
    1  1    2   -2 9999  2                      0.00000000E+00                      0.59999999E-01 -0.6000E-01  0.9000E+00
    1  2    2   -2 9999  1                      0.00000000E+00                      0.14500000E+01 -0.1450E+01  0.1630E+01
    1  2    2   -2 9999  2                      0.00000000E+00                      0.74000001E+00 -0.7400E+00  0.1730E+01
    1  4    2   -2 9999  1                     -0.38999999E+00                      0.20000000E+00 -0.5900E+00  0.2900E+01
    1  4    2   -2 9999  2                      0.82000005E+00                      0.89999998E+00 -0.8000E-01  0.2780E+01
    1  1   -2    1 9999  1                      0.00000000E+00                      0.89999998E+00 -0.9000E+00  0.9000E+00
    1  1   -2    1 9999  2                      0.00000000E+00                      0.59999999E-01 -0.6000E-01  0.9000E+00
    1  2   -2    1 9999  1                      0.00000000E+00                     -0.55000001E+00  0.5500E+00  0.1630E+01
    1  2   -2    1 9999  2                      0.00000000E+00                      0.23000000E+00 -0.2300E+00  0.1730E+01
    1  4   -2    1 9999  1                      0.00000000E+00                      0.82999998E+00 -0.8300E+00  0.2900E+01
    1  4   -2    1 9999  2                      0.00000000E+00                     -0.38999999E+00  0.3900E+00  0.2780E+01
    1  1   -1   -2 9999  1                      0.00000000E+00                      0.89999998E+00 -0.9000E+00  0.9000E+00
    1  1   -1   -2 9999  2                      0.00000000E+00                      0.59999999E-01 -0.6000E-01  0.9000E+00
    1  2   -1   -2 9999  1                      0.00000000E+00                      0.10400000E+01 -0.1040E+01  0.1630E+01
    1  2   -1   -2 9999  2                      0.00000000E+00                      0.79000002E+00 -0.7900E+00  0.1730E+01
    1  4   -1   -2 9999  1                      0.72000003E+00                      0.19500000E+01 -0.1230E+01  0.2900E+01
    1  4   -1   -2 9999  2                      0.50000006E+00                      0.12200000E+01 -0.7200E+00  0.2780E+01

 Test of subprogram number  2            CDOTU
                                    ----- PASS -----

 Test of subprogram number  3            CAXPY
                                       FAIL

 CASE  N INCX INCY MODE  I                             COMP(I)                             TRUE(I)  DIFFERENCE     SIZE(I)

    3  2    1    1 9999  1                     -0.33000001E+00                      0.31999999E+00 -0.6500E+00  0.1540E+01
    3  2    1    1 9999  3                     -0.89999998E+00                     -0.15500000E+01  0.6500E+00  0.1540E+01
    3  4    1    1 9999  1                     -0.14800000E+01                      0.31999999E+00 -0.1800E+01  0.1540E+01
    3  4    1    1 9999  2                     -0.21600001E+01                     -0.14100000E+01 -0.7500E+00  0.1540E+01
    3  4    1    1 9999  3                     -0.89999998E+00                     -0.15500000E+01  0.6500E+00  0.1540E+01
    3  4    1    1 9999  5                      0.69999999E+00                      0.29999999E-01  0.6700E+00  0.1540E+01
    3  4    1    1 9999  6                     -0.60000002E+00                     -0.88999999E+00  0.2900E+00  0.1540E+01
    3  4    1    1 9999  7                      0.10000000E+00                     -0.38000000E+00  0.4800E+00  0.1540E+01
    3  4    1    1 9999  8                     -0.50000000E+00                     -0.95999998E+00  0.4600E+00  0.1540E+01
    3  2    2   -2 9999  1                      0.60000002E+00                     -0.70000000E-01  0.6700E+00  0.1540E+01
    3  2    2   -2 9999  2                     -0.60000002E+00                     -0.88999999E+00  0.2900E+00  0.1540E+01
    3  2    2   -2 9999  5                     -0.25000000E+00                      0.41999999E+00 -0.6700E+00  0.1540E+01
    3  2    2   -2 9999  6                     -0.17000000E+01                     -0.14100000E+01 -0.2900E+00  0.1540E+01
    3  4    2   -2 9999  1                      0.60000002E+00                      0.77999997E+00 -0.1800E+00  0.1540E+01
    3  4    2   -2 9999  2                     -0.60000002E+00                      0.59999999E-01 -0.6600E+00  0.1540E+01
    3  4    2   -2 9999  5                      0.69999999E+00                      0.59999999E-01  0.6400E+00  0.1540E+01
    3  4    2   -2 9999  6                     -0.60000002E+00                     -0.13000000E+00 -0.4700E+00  0.1540E+01
    3  4    2   -2 9999  9                     -0.10000000E+00                     -0.76999998E+00  0.6700E+00  0.1540E+01
    3  4    2   -2 9999 10                     -0.20000000E+00                     -0.49000001E+00  0.2900E+00  0.1540E+01
    3  4    2   -2 9999 13                     -0.60999995E+00                      0.51999998E+00 -0.1130E+01  0.1540E+01
    3  4    2   -2 9999 14                     -0.66999990E+00                     -0.15100000E+01  0.8400E+00  0.1540E+01
    3  2   -2    1 9999  1                     -0.34999996E+00                     -0.70000000E-01 -0.2800E+00  0.1540E+01
    3  2   -2    1 9999  2                     -0.17000000E+01                     -0.88999999E+00 -0.8100E+00  0.1540E+01
    3  2   -2    1 9999  3                     -0.89999998E+00                     -0.11799999E+01  0.2800E+00  0.1540E+01
    3  2   -2    1 9999  4                      0.50000000E+00                     -0.31000000E+00  0.8100E+00  0.1540E+01
    3  4   -2    1 9999  1                     -0.80999988E+00                      0.77999997E+00 -0.1590E+01  0.1540E+01
    3  4   -2    1 9999  2                     -0.57000005E+00                      0.59999999E-01 -0.6300E+00  0.1540E+01
    3  4   -2    1 9999  3                     -0.89999998E+00                     -0.15400000E+01  0.6400E+00  0.1540E+01
    3  4   -2    1 9999  4                      0.50000000E+00                      0.97000003E+00 -0.4700E+00  0.1540E+01
    3  4   -2    1 9999  5                      0.69999999E+00                      0.29999999E-01  0.6700E+00  0.1540E+01
    3  4   -2    1 9999  6                     -0.60000002E+00                     -0.88999999E+00  0.2900E+00  0.1540E+01
    3  4   -2    1 9999  7                      0.10000000E+00                     -0.18000001E+00  0.2800E+00  0.1540E+01
    3  4   -2    1 9999  8                     -0.50000000E+00                     -0.13099999E+01  0.8100E+00  0.1540E+01
    3  2   -1   -2 9999  1                      0.60000002E+00                      0.31999999E+00  0.2800E+00  0.1540E+01
    3  2   -1   -2 9999  2                     -0.60000002E+00                     -0.14100000E+01  0.8100E+00  0.1540E+01
    3  2   -1   -2 9999  5                     -0.23000002E+00                      0.50000001E-01 -0.2800E+00  0.1540E+01
    3  2   -1   -2 9999  6                     -0.14100001E+01                     -0.60000002E+00 -0.8100E+00  0.1540E+01
    3  4   -1   -2 9999  1                      0.60000002E+00                      0.31999999E+00  0.2800E+00  0.1540E+01
    3  4   -1   -2 9999  2                     -0.60000002E+00                     -0.14100000E+01  0.8100E+00  0.1540E+01
    3  4   -1   -2 9999  5                      0.69999999E+00                      0.50000001E-01  0.6500E+00  0.1540E+01
    3  4   -1   -2 9999  9                     -0.10000000E+00                     -0.76999998E+00  0.6700E+00  0.1540E+01
    3  4   -1   -2 9999 10                     -0.20000000E+00                     -0.49000001E+00  0.2900E+00  0.1540E+01
    3  4   -1   -2 9999 13                     -0.12800000E+01                      0.31999999E+00 -0.1600E+01  0.1540E+01
    3  4   -1   -2 9999 14                     -0.22600000E+01                     -0.11600000E+01 -0.1100E+01  0.1540E+01

 Test of subprogram number  4            CCOPY
                                    ----- PASS -----

 Test of subprogram number  5            CSWAP
                                    ----- PASS -----

 Test of subprogram number  6            SCNRM2
                                    ----- PASS -----

 Test of subprogram number  7            SCASUM
                                    ----- PASS -----

 Test of subprogram number  8            CSCAL
                                    ----- PASS -----

 Test of subprogram number  9            CSSCAL
                                    ----- PASS -----

 Test of subprogram number 10            ICAMAX
                                    ----- PASS -----
OMP_NUM_THREADS=1 OMP_NUM_THREADS=1 ./zblat1
cblat2.f:2050:0:

      $                   NARGS, NC, NS

Warning: 'nargs' may be used uninitialized in this function [-Wmaybe-uninitialized]
 Complex BLAS Test Program Results

 Test of subprogram number  1            ZDOTC
                                       FAIL

 CASE  N INCX INCY MODE  I                             COMP(I)                             TRUE(I)  DIFFERENCE     SIZE(I)

    1  1    1    1 9999  1                      0.00000000D+00                      0.90000000D+00 -0.9000D+00  0.9000D+00
    1  1    1    1 9999  2                      0.00000000D+00                      0.60000000D-01 -0.6000D-01  0.9000D+00
    1  2    1    1 9999  1                      0.10000000D-01                      0.91000000D+00 -0.9000D+00  0.1630D+01
    1  2    1    1 9999  2                     -0.83000000D+00                     -0.77000000D+00 -0.6000D-01  0.1730D+01
    1  4    1    1 9999  1                      0.90000000D+00                      0.18000000D+01 -0.9000D+00  0.2900D+01
    1  4    1    1 9999  2                     -0.16000000D+00                     -0.10000000D+00 -0.6000D-01  0.2780D+01
    1  1    2   -2 9999  1                      0.00000000D+00                      0.90000000D+00 -0.9000D+00  0.9000D+00
    1  1    2   -2 9999  2                      0.00000000D+00                      0.60000000D-01 -0.6000D-01  0.9000D+00
    1  2    2   -2 9999  1                      0.00000000D+00                      0.14500000D+01 -0.1450D+01  0.1630D+01
    1  2    2   -2 9999  2                      0.00000000D+00                      0.74000000D+00 -0.7400D+00  0.1730D+01
    1  4    2   -2 9999  1                     -0.20000000D+00                      0.20000000D+00 -0.4000D+00  0.2900D+01
    1  4    2   -2 9999  2                      0.75000000D+00                      0.90000000D+00 -0.1500D+00  0.2780D+01
    1  1   -2    1 9999  1                      0.00000000D+00                      0.90000000D+00 -0.9000D+00  0.9000D+00
    1  1   -2    1 9999  2                      0.00000000D+00                      0.60000000D-01 -0.6000D-01  0.9000D+00
    1  2   -2    1 9999  1                      0.00000000D+00                     -0.55000000D+00  0.5500D+00  0.1630D+01
    1  2   -2    1 9999  2                      0.00000000D+00                      0.23000000D+00 -0.2300D+00  0.1730D+01
    1  4   -2    1 9999  1                      0.10800000D+01                      0.83000000D+00  0.2500D+00  0.2900D+01
    1  4   -2    1 9999  2                     -0.12000000D+00                     -0.39000000D+00  0.2700D+00  0.2780D+01
    1  1   -1   -2 9999  1                      0.00000000D+00                      0.90000000D+00 -0.9000D+00  0.9000D+00
    1  1   -1   -2 9999  2                      0.00000000D+00                      0.60000000D-01 -0.6000D-01  0.9000D+00
    1  2   -1   -2 9999  1                      0.14000000D+00                      0.10400000D+01 -0.9000D+00  0.1630D+01
    1  2   -1   -2 9999  2                      0.73000000D+00                      0.79000000D+00 -0.6000D-01  0.1730D+01
    1  4   -1   -2 9999  1                      0.10500000D+01                      0.19500000D+01 -0.9000D+00  0.2900D+01
    1  4   -1   -2 9999  2                      0.11600000D+01                      0.12200000D+01 -0.6000D-01  0.2780D+01

 Test of subprogram number  2            ZDOTU
                                    ----- PASS -----

 Test of subprogram number  3            ZAXPY
                                    ----- PASS -----

 Test of subprogram number  4            ZCOPY
                                    ----- PASS -----

 Test of subprogram number  5            ZSWAP
                                    ----- PASS -----

 Test of subprogram number  6            DZNRM2
                                    ----- PASS -----

 Test of subprogram number  7            DZASUM
                                    ----- PASS -----

 Test of subprogram number  8            ZSCAL
                                    ----- PASS -----

 Test of subprogram number  9            ZDSCAL
                                    ----- PASS -----

 Test of subprogram number 10            IZAMAX
                                    ----- PASS -----
OMP_NUM_THREADS=2 ./sblat1
 Real BLAS Test Program Results

 Test of subprogram number  1             SDOT
                                    ----- PASS -----

 Test of subprogram number  2            SAXPY
                                    ----- PASS -----

 Test of subprogram number  3            SROTG
                                    ----- PASS -----

 Test of subprogram number  4             SROT
                                    ----- PASS -----

 Test of subprogram number  5            SCOPY
                                    ----- PASS -----

 Test of subprogram number  6            SSWAP
                                    ----- PASS -----

 Test of subprogram number  7            SNRM2
                                    ----- PASS -----

 Test of subprogram number  8            SASUM
                                    ----- PASS -----

 Test of subprogram number  9            SSCAL
                                    ----- PASS -----

 Test of subprogram number 10            ISAMAX
                                    ----- PASS -----

 Test of subprogram number 11            SROTMG
                                    ----- PASS -----

 Test of subprogram number 12            SROTM
                                    ----- PASS -----

 Test of subprogram number 13            SDSDOT
                                    ----- PASS -----
OMP_NUM_THREADS=2 ./dblat1
 Real BLAS Test Program Results

 Test of subprogram number  1             DDOT
                                       FAIL

 CASE  N INCX INCY  I                             COMP(I)                             TRUE(I)  DIFFERENCE     SIZE(I)

    1  1    1    1  1                      0.00000000D+00                      0.30000000D+00 -0.3000D+00  0.3000D+00
    1  2    1    1  1                      0.00000000D+00                      0.21000000D+00 -0.2100D+00  0.1600D+01
    1  4    1    1  1                      0.56000000D+00                      0.62000000D+00 -0.6000D-01  0.3200D+01
    1  1    2   -2  1                      0.00000000D+00                      0.30000000D+00 -0.3000D+00  0.3000D+00
    1  2    2   -2  1                      0.00000000D+00                     -0.70000000D-01  0.7000D-01  0.1600D+01
    1  4    2   -2  1                      0.57000000D+00                      0.85000000D+00 -0.2800D+00  0.3200D+01
    1  1   -2    1  1                      0.00000000D+00                      0.30000000D+00 -0.3000D+00  0.3000D+00
    1  2   -2    1  1                      0.00000000D+00                     -0.79000000D+00  0.7900D+00  0.1600D+01
    1  4   -2    1  1                     -0.96000000D+00                     -0.74000000D+00 -0.2200D+00  0.3200D+01
    1  1   -1   -2  1                      0.00000000D+00                      0.30000000D+00 -0.3000D+00  0.3000D+00
    1  2   -1   -2  1                      0.30000000D-01                      0.33000000D+00 -0.3000D+00  0.1600D+01
    1  4   -1   -2  1                      0.97000000D+00                      0.12700000D+01 -0.3000D+00  0.3200D+01

 Test of subprogram number  2            DAXPY
                                    ----- PASS -----

 Test of subprogram number  3            DROTG
                                    ----- PASS -----

 Test of subprogram number  4             DROT
                                    ----- PASS -----

 Test of subprogram number  5            DCOPY
                                    ----- PASS -----

 Test of subprogram number  6            DSWAP
                                    ----- PASS -----

 Test of subprogram number  7            DNRM2
                                    ----- PASS -----

 Test of subprogram number  8            DASUM
                                    ----- PASS -----

 Test of subprogram number  9            DSCAL
                                    ----- PASS -----

 Test of subprogram number 10            IDAMAX
                                    ----- PASS -----

 Test of subprogram number 11            DROTMG
                                    ----- PASS -----

 Test of subprogram number 12            DROTM
                                    ----- PASS -----

 Test of subprogram number 13            DSDOT
                                    ----- PASS -----
OMP_NUM_THREADS=2 ./cblat1
dblat2.f:1724:0:

      $                   LDA, LDAS, LJ, LX, N, NARGS, NC, NS

Warning: 'nargs' may be used uninitialized in this function [-Wmaybe-uninitialized]
 Complex BLAS Test Program Results

 Test of subprogram number  1            CDOTC
                                       FAIL

 CASE  N INCX INCY MODE  I                             COMP(I)                             TRUE(I)  DIFFERENCE     SIZE(I)

    1  1    1    1 9999  1                      0.00000000E+00                      0.89999998E+00 -0.9000E+00  0.9000E+00
    1  1    1    1 9999  2                      0.00000000E+00                      0.59999999E-01 -0.6000E-01  0.9000E+00
    1  2    1    1 9999  1                      0.00000000E+00                      0.91000003E+00 -0.9100E+00  0.1630E+01
    1  2    1    1 9999  2                      0.00000000E+00                     -0.76999998E+00  0.7700E+00  0.1730E+01
    1  4    1    1 9999  1                      0.42000002E+00                      0.18000000E+01 -0.1380E+01  0.2900E+01
    1  4    1    1 9999  2                     -0.19999996E-01                     -0.10000000E+00  0.8000E-01  0.2780E+01
    1  1    2   -2 9999  1                      0.00000000E+00                      0.89999998E+00 -0.9000E+00  0.9000E+00
    1  1    2   -2 9999  2                      0.00000000E+00                      0.59999999E-01 -0.6000E-01  0.9000E+00
    1  2    2   -2 9999  1                      0.00000000E+00                      0.14500000E+01 -0.1450E+01  0.1630E+01
    1  2    2   -2 9999  2                      0.00000000E+00                      0.74000001E+00 -0.7400E+00  0.1730E+01
    1  4    2   -2 9999  1                     -0.38999999E+00                      0.20000000E+00 -0.5900E+00  0.2900E+01
    1  4    2   -2 9999  2                      0.82000005E+00                      0.89999998E+00 -0.8000E-01  0.2780E+01
    1  1   -2    1 9999  1                      0.00000000E+00                      0.89999998E+00 -0.9000E+00  0.9000E+00
    1  1   -2    1 9999  2                      0.00000000E+00                      0.59999999E-01 -0.6000E-01  0.9000E+00
    1  2   -2    1 9999  1                      0.00000000E+00                     -0.55000001E+00  0.5500E+00  0.1630E+01
    1  2   -2    1 9999  2                      0.00000000E+00                      0.23000000E+00 -0.2300E+00  0.1730E+01
    1  4   -2    1 9999  1                      0.00000000E+00                      0.82999998E+00 -0.8300E+00  0.2900E+01
    1  4   -2    1 9999  2                      0.00000000E+00                     -0.38999999E+00  0.3900E+00  0.2780E+01
    1  1   -1   -2 9999  1                      0.00000000E+00                      0.89999998E+00 -0.9000E+00  0.9000E+00
    1  1   -1   -2 9999  2                      0.00000000E+00                      0.59999999E-01 -0.6000E-01  0.9000E+00
    1  2   -1   -2 9999  1                      0.00000000E+00                      0.10400000E+01 -0.1040E+01  0.1630E+01
    1  2   -1   -2 9999  2                      0.00000000E+00                      0.79000002E+00 -0.7900E+00  0.1730E+01
    1  4   -1   -2 9999  1                      0.72000003E+00                      0.19500000E+01 -0.1230E+01  0.2900E+01
    1  4   -1   -2 9999  2                      0.50000006E+00                      0.12200000E+01 -0.7200E+00  0.2780E+01

 Test of subprogram number  2            CDOTU
                                    ----- PASS -----

 Test of subprogram number  3            CAXPY
                                       FAIL

 CASE  N INCX INCY MODE  I                             COMP(I)                             TRUE(I)  DIFFERENCE     SIZE(I)

    3  2    1    1 9999  1                     -0.33000001E+00                      0.31999999E+00 -0.6500E+00  0.1540E+01
    3  2    1    1 9999  3                     -0.89999998E+00                     -0.15500000E+01  0.6500E+00  0.1540E+01
    3  4    1    1 9999  1                     -0.14800000E+01                      0.31999999E+00 -0.1800E+01  0.1540E+01
    3  4    1    1 9999  2                     -0.21600001E+01                     -0.14100000E+01 -0.7500E+00  0.1540E+01
    3  4    1    1 9999  3                     -0.89999998E+00                     -0.15500000E+01  0.6500E+00  0.1540E+01
    3  4    1    1 9999  5                      0.69999999E+00                      0.29999999E-01  0.6700E+00  0.1540E+01
    3  4    1    1 9999  6                     -0.60000002E+00                     -0.88999999E+00  0.2900E+00  0.1540E+01
    3  4    1    1 9999  7                      0.10000000E+00                     -0.38000000E+00  0.4800E+00  0.1540E+01
    3  4    1    1 9999  8                     -0.50000000E+00                     -0.95999998E+00  0.4600E+00  0.1540E+01
    3  2    2   -2 9999  1                      0.60000002E+00                     -0.70000000E-01  0.6700E+00  0.1540E+01
    3  2    2   -2 9999  2                     -0.60000002E+00                     -0.88999999E+00  0.2900E+00  0.1540E+01
    3  2    2   -2 9999  5                     -0.25000000E+00                      0.41999999E+00 -0.6700E+00  0.1540E+01
    3  2    2   -2 9999  6                     -0.17000000E+01                     -0.14100000E+01 -0.2900E+00  0.1540E+01
    3  4    2   -2 9999  1                      0.60000002E+00                      0.77999997E+00 -0.1800E+00  0.1540E+01
    3  4    2   -2 9999  2                     -0.60000002E+00                      0.59999999E-01 -0.6600E+00  0.1540E+01
    3  4    2   -2 9999  5                      0.69999999E+00                      0.59999999E-01  0.6400E+00  0.1540E+01
    3  4    2   -2 9999  6                     -0.60000002E+00                     -0.13000000E+00 -0.4700E+00  0.1540E+01
    3  4    2   -2 9999  9                     -0.10000000E+00                     -0.76999998E+00  0.6700E+00  0.1540E+01
    3  4    2   -2 9999 10                     -0.20000000E+00                     -0.49000001E+00  0.2900E+00  0.1540E+01
    3  4    2   -2 9999 13                     -0.60999995E+00                      0.51999998E+00 -0.1130E+01  0.1540E+01
    3  4    2   -2 9999 14                     -0.66999990E+00                     -0.15100000E+01  0.8400E+00  0.1540E+01
    3  2   -2    1 9999  1                     -0.34999996E+00                     -0.70000000E-01 -0.2800E+00  0.1540E+01
    3  2   -2    1 9999  2                     -0.17000000E+01                     -0.88999999E+00 -0.8100E+00  0.1540E+01
    3  2   -2    1 9999  3                     -0.89999998E+00                     -0.11799999E+01  0.2800E+00  0.1540E+01
    3  2   -2    1 9999  4                      0.50000000E+00                     -0.31000000E+00  0.8100E+00  0.1540E+01
    3  4   -2    1 9999  1                     -0.80999988E+00                      0.77999997E+00 -0.1590E+01  0.1540E+01
    3  4   -2    1 9999  2                     -0.57000005E+00                      0.59999999E-01 -0.6300E+00  0.1540E+01
    3  4   -2    1 9999  3                     -0.89999998E+00                     -0.15400000E+01  0.6400E+00  0.1540E+01
    3  4   -2    1 9999  4                      0.50000000E+00                      0.97000003E+00 -0.4700E+00  0.1540E+01
    3  4   -2    1 9999  5                      0.69999999E+00                      0.29999999E-01  0.6700E+00  0.1540E+01
    3  4   -2    1 9999  6                     -0.60000002E+00                     -0.88999999E+00  0.2900E+00  0.1540E+01
    3  4   -2    1 9999  7                      0.10000000E+00                     -0.18000001E+00  0.2800E+00  0.1540E+01
    3  4   -2    1 9999  8                     -0.50000000E+00                     -0.13099999E+01  0.8100E+00  0.1540E+01

    3  2   -1   -2 9999  1                      0.60000002E+00                      0.31999999E+00  0.2800E+00  0.1540E+01
    3  2   -1   -2 9999  2                     -0.60000002E+00                     -0.14100000E+01  0.8100E+00  0.1540E+01
    3  2   -1   -2 9999  5                     -0.23000002E+00                      0.50000001E-01 -0.2800E+00  0.1540E+01
    3  2   -1   -2 9999  6                     -0.14100001E+01                     -0.60000002E+00 -0.8100E+00  0.1540E+01
    3  4   -1   -2 9999  1                      0.60000002E+00                      0.31999999E+00  0.2800E+00  0.1540E+01
    3  4   -1   -2 9999  2                     -0.60000002E+00                     -0.14100000E+01  0.8100E+00  0.1540E+01
    3  4   -1   -2 9999  5                      0.69999999E+00                      0.50000001E-01  0.6500E+00  0.1540E+01
    3  4   -1   -2 9999  9                     -0.10000000E+00                     -0.76999998E+00  0.6700E+00  0.1540E+01
    3  4   -1   -2 9999 10                     -0.20000000E+00                     -0.49000001E+00  0.2900E+00  0.1540E+01
    3  4   -1   -2 9999 13                     -0.12800000E+01                      0.31999999E+00 -0.1600E+01  0.1540E+01
    3  4   -1   -2 9999 14                     -0.22600000E+01                     -0.11600000E+01 -0.1100E+01  0.1540E+01

 Test of subprogram number  4            CCOPY
                                    ----- PASS -----

 Test of subprogram number  5            CSWAP
                                    ----- PASS -----

 Test of subprogram number  6            SCNRM2
                                    ----- PASS -----

 Test of subprogram number  7            SCASUM
                                    ----- PASS -----

 Test of subprogram number  8            CSCAL
                                    ----- PASS -----

 Test of subprogram number  9            CSSCAL
                                    ----- PASS -----

 Test of subprogram number 10            ICAMAX
                                    ----- PASS -----
OMP_NUM_THREADS=2 ./zblat1
 Complex BLAS Test Program Results

 Test of subprogram number  1            ZDOTC
                                       FAIL

 CASE  N INCX INCY MODE  I                             COMP(I)                             TRUE(I)  DIFFERENCE     SIZE(I)

    1  1    1    1 9999  1                      0.00000000D+00                      0.90000000D+00 -0.9000D+00  0.9000D+00
    1  1    1    1 9999  2                      0.00000000D+00                      0.60000000D-01 -0.6000D-01  0.9000D+00
    1  2    1    1 9999  1                      0.10000000D-01                      0.91000000D+00 -0.9000D+00  0.1630D+01
    1  2    1    1 9999  2                     -0.83000000D+00                     -0.77000000D+00 -0.6000D-01  0.1730D+01
    1  4    1    1 9999  1                      0.90000000D+00                      0.18000000D+01 -0.9000D+00  0.2900D+01
    1  4    1    1 9999  2                     -0.16000000D+00                     -0.10000000D+00 -0.6000D-01  0.2780D+01
    1  1    2   -2 9999  1                      0.00000000D+00                      0.90000000D+00 -0.9000D+00  0.9000D+00
    1  1    2   -2 9999  2                      0.00000000D+00                      0.60000000D-01 -0.6000D-01  0.9000D+00
    1  2    2   -2 9999  1                      0.00000000D+00                      0.14500000D+01 -0.1450D+01  0.1630D+01
    1  2    2   -2 9999  2                      0.00000000D+00                      0.74000000D+00 -0.7400D+00  0.1730D+01
    1  4    2   -2 9999  1                     -0.20000000D+00                      0.20000000D+00 -0.4000D+00  0.2900D+01
    1  4    2   -2 9999  2                      0.75000000D+00                      0.90000000D+00 -0.1500D+00  0.2780D+01
    1  1   -2    1 9999  1                      0.00000000D+00                      0.90000000D+00 -0.9000D+00  0.9000D+00
    1  1   -2    1 9999  2                      0.00000000D+00                      0.60000000D-01 -0.6000D-01  0.9000D+00
    1  2   -2    1 9999  1                      0.00000000D+00                     -0.55000000D+00  0.5500D+00  0.1630D+01
    1  2   -2    1 9999  2                      0.00000000D+00                      0.23000000D+00 -0.2300D+00  0.1730D+01
    1  4   -2    1 9999  1                      0.10800000D+01                      0.83000000D+00  0.2500D+00  0.2900D+01
    1  4   -2    1 9999  2                     -0.12000000D+00                     -0.39000000D+00  0.2700D+00  0.2780D+01
    1  1   -1   -2 9999  1                      0.00000000D+00                      0.90000000D+00 -0.9000D+00  0.9000D+00
    1  1   -1   -2 9999  2                      0.00000000D+00                      0.60000000D-01 -0.6000D-01  0.9000D+00
    1  2   -1   -2 9999  1                      0.14000000D+00                      0.10400000D+01 -0.9000D+00  0.1630D+01
    1  2   -1   -2 9999  2                      0.73000000D+00                      0.79000000D+00 -0.6000D-01  0.1730D+01
    1  4   -1   -2 9999  1                      0.10500000D+01                      0.19500000D+01 -0.9000D+00  0.2900D+01
    1  4   -1   -2 9999  2                      0.11600000D+01                      0.12200000D+01 -0.6000D-01  0.2780D+01

 Test of subprogram number  2            ZDOTU
                                    ----- PASS -----

 Test of subprogram number  3            ZAXPY
                                    ----- PASS -----

 Test of subprogram number  4            ZCOPY
                                    ----- PASS -----

 Test of subprogram number  5            ZSWAP
                                    ----- PASS -----

 Test of subprogram number  6            DZNRM2
                                    ----- PASS -----

 Test of subprogram number  7            DZASUM
                                    ----- PASS -----

 Test of subprogram number  8            ZSCAL
                                    ----- PASS -----

 Test of subprogram number  9            ZDSCAL
                                    ----- PASS -----

 Test of subprogram number 10            IZAMAX
                                    ----- PASS -----
pkubaj commented 5 years ago

Note that I only patched those fragments of assembly sources that reported bad relocations.

Should just patch them all ('s/ifdef linux/if defined(linux) || defined(FreeBSD)/g' kernel/power/*.S)?

martin-frbg commented 5 years ago

It probably makes sense to try and patch them all (and the gemv_n.S where you are currently getting the segmentation fault has a second "ifdef linux" section in line 255, I wonder if you already changed both of them ?) This may still not fix all problems - e.g. the wrong result from DDOT (assuming you are building for POWER8 where both ddot.c and ddot_microk_power.c do not have any os-specific definitions outside what gets imported from common_power.h).

pkubaj commented 5 years ago

After doing that, I get:

../kernel/power/gemv_n.S: Assembler messages:
../kernel/power/gemv_n.S:260: Error: junk at end of line: `+280(1)'
../kernel/power/gemv_n.S:261: Error: junk at end of line: `+280(1)'
../kernel/power/gemv_n.S:262: Error: junk at end of line: `+280(1)'
../kernel/power/gemv_t.S: Assembler messages:
../kernel/power/gemv_t.S:268: Error: junk at end of line: `+288(1)'
../kernel/power/gemv_t.S:269: Error: junk at end of line: `+288(1)'
../kernel/power/gemv_t.S:270: Error: junk at end of line: `+288(1)'
pkubaj commented 5 years ago

Something looks wrong in https://github.com/xianyi/OpenBLAS/blob/develop/kernel/power/strmm_kernel_16x8_power8.S, lines 85 and 86. Both contain macros defining STACKSIZE to different values.

martin-frbg commented 5 years ago

New error in gemv_n.S looks as if it is evaluating the (SP) as a literal "1" which seems weird. Similar expressions are used on all supported platforms, no idea what to make of this error unless you have some hard to spot typo, special character or whatever, there. strmm_kernel_16x8_power8.S shows my fumbling to increase the stack to make room for saving the vector registers, leaving the old value to sort-of document the change in case it might be incorrect.

pkubaj commented 5 years ago

OK, then, something else.

Should 64BIT be defined? It's not defined by default, should I define it manually in CFLAGS?

EDIT: I found it defined in config.h, nevermind.

brada4 commented 5 years ago

@pkubaj no, 64bit CPU is detected automatically. You do not want int64 interface (INTERFACE64=1) in Makefile.rule or command line, that makes library incompatible with Netlib LAPACK but permits single dimensions of arguments to exceed 4 billion, or 16GB, subject to recompiling all client code.

pkubaj commented 5 years ago

I'd like to revive this issue.

I got openblas (0.3.6) to build with following patches: sed -e 's/defined(linux)/(defined(linux) || defined(__FreeBSD__))/g' -e 's/ifdef linux/if defined(linux) || defined(__FreeBSD__)/g' kernel/power/*.S

--- common_power.h.orig 2019-06-24 17:16:36 UTC
+++ common_power.h
@@ -499,7 +499,7 @@ static inline int blas_quickdivide(blasint x, blasint

 #if defined(ASSEMBLER) && !defined(NEEDPARAM)

-#ifdef OS_LINUX
+#if defined(OS_LINUX) || defined(OS_FREEBSD)
 #ifndef __64BIT__
 #define PROLOGUE \
        .section .text;\
@@ -784,7 +784,7 @@ Lmcount$lazy_ptr:

 #define HALT           mfspr   r0, 1023

-#ifdef OS_LINUX
+#if defined(OS_LINUX) || defined(OS_FREEBSD)
 #if defined(PPC440) || defined(PPC440FP2)
 #undef  MAX_CPU_NUMBER
 #define MAX_CPU_NUMBER 1
@@ -829,7 +829,7 @@ Lmcount$lazy_ptr:
 #define MAP_ANONYMOUS MAP_ANON
 #endif

-#ifdef OS_LINUX
+#if defined(OS_LINUX) || defined(OS_FREEBSD)
 #ifndef __64BIT__
 #define FRAMESLOT(X) (((X) * 4) + 8)
 #else

Now, the problem is that sblat2 and sblat3 tests freeze. I left them running over the night (just in case). The computer didn't run anything big apart from those tests.

root@talos:~ # ps auxwwf | head -n 1 ; ps auxwwf | grep sblat
USER         PID   %CPU %MEM   VSZ   RSS TT  STAT STARTED         TIME COMMAND
root       28586  100.0  0.0 81036  5788  1  R+   23:29      598:35.74 ./sblat2
root       32971  100.0  0.0 81664  6288  1  R+   23:29      598:35.23 ./sblat3

Is it possible that those tests are incorrect?

Could there be an endianness issue? FreeBSD on ppc64 is big-endian (even on POWER8 and 9), while many software nowadays expects ppc64le (little-endian).

martin-frbg commented 5 years ago

Ah yes, POWER8 kernels are currently ppc64le-only (as found out in #1997). Building for TARGET=POWER6 will probably work with your patch.

pkubaj commented 5 years ago

Ah yes, POWER8 kernels are currently ppc64le-only (as found out in #1997). Building for TARGET=POWER6 will probably work with your patch.

Is POWER6 necessary? I found it also compiles with POWER7.

martin-frbg commented 5 years ago

POWER7 is mapped to POWER6 internally.

pkubaj commented 5 years ago

Thanks.

Since it builds and all tests pass, I assume this patch is ok. Can you commit it straight away (together with this sed) or do you require a pull requst?

martin-frbg commented 5 years ago

PR would be easier to apply but I can generate one from your information if it is too much hassle for you.

pkubaj commented 5 years ago

https://github.com/xianyi/OpenBLAS/pull/2169