OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.32k stars 1.49k forks source link

OpenPower nonvolatile registers not saved/restored #1263

Closed Estoltz closed 6 years ago

Estoltz commented 7 years ago

Hello, We have found a case where the OpenPOWER nonvolatile registers vsr52-vsr63 are not saved and restored across calls to several assembly language functions.   According to the latest Power ABI (rev. 1.4 dated May 10, 2017) found at:   https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture   The vector registers vr20-vr31 (see Table 2.22) are nonvolatile (callee save). The vector registers vr0-vr31 are overlaid onto  Vector Scalar Registers (VSR) vsr32-vsr63 (see Fig 2.17). Thus vsr52-vsr63 are also nonvolatile, and for a routine to use any of these registers requires a save-restore sequence within the assembly code.   Within the OpenBLAS-0.2.[19,20]/kernel/power source code there are 12 files that reference these nonvolatile registers (vsr52-vsr63):   cgemm_macros_8x4_power8.S ctrmm_macros_8x4_power8.S dgemm_macros_16x4_power8.S dgemm_ncopy_macros_4_power8.S dgemm_tcopy_macros_16_power8.S dtrmm_macros_16x4_power8.S dtrsm_macros_LT_16x4_power8.S sgemm_macros_16x8_power8.S strmm_macros_16x8_power8.S zgemm_macros_8x2_power8.S zgemm_tcopy_macros_8_power8.S ztrmm_macros_8x2_power8.S   These 12 files are used in a subset of the .S files which include the assembly macros within these files. These built routines do correctly save and restore fpr14-fpr31, the 64-bit registers which overlay the left half of vsr14-vsr31 (and are specified in Table 2.20 of the ABI document as nonvolatile). However, vsr52-vsr63 are NOT saved and restored. This  omission means that if any of vr20-vr31 (or by different names vsr52-vsr63) are defined and then used after a call which contains any of the 12 assembly files listed above, corruption can, and indeed has, occurred.  We have tracked down instances where these calls have overwritten nonvolatile registers; these cases resulted in a segfault in our test program.   We believe a solution is to save and restore vsr52-vsr63 (or vr30-vr31) in the same manner, and roughly in the same locations, as is currently done for fpr14-fpr31.   Please feel free to contact us for any additional details or further explanation.

Thanks! Eric

martin-frbg commented 7 years ago

This is a known defect unfortunately, IBM's Alan Modra was kind enough to fix it in all inline assembly used in the relevant .c files recently (see #1078) but the plain assembly files are still awaiting repair. (BTW the fixed files have since led to compilation issues for users of clang (#1120) and xlc (#1248) so if you happen to know some "universally acceptable" way of putting vector registers on clobber lists the affected folks would appreciate your input)

forandom commented 7 years ago

Yeah, I also get this issue. Is someone fixing the .s files?

martin-frbg commented 7 years ago

I have started working on them now, but as I do this mainly to teach myself assembly I am progressing through fifty shades of segfaults...

quickwritereader commented 7 years ago

I checked the paper. It mentions vr20-31 only for DFP (decimal ). But we use binary fp.
Anyway it seems Wern was aware of some issues. Probably thats why he refactored the way that it is easy to change save/restore section.

(I do not know power asm. Maybe I will learn power9 in near future.)

Estoltz commented 7 years ago

The ABI has a section for Floating Point registers, then a section for DFP support, then a section for Vector Registers. This last section is not restricted to DFP support and addresses the general role of the Vector Registers within the function-calling sequence, which includes binary fp calculations.

Eric

From: Abdelrauf notifications@github.com Reply-To: xianyi/OpenBLAS reply@reply.github.com Date: Friday, September 22, 2017 at 4:59 PM To: xianyi/OpenBLAS OpenBLAS@noreply.github.com Cc: Eric Stoltz estoltz@nvidia.com, Author author@noreply.github.com Subject: Re: [xianyi/OpenBLAS] OpenPower nonvolatile registers not saved/restored (#1263)

I checked the paper. It mentions vr20-31 only for DFP (decimal ). But we use binary fp. Anyway it seems Wern was aware of some issues. Probably thats why he refactored the way that it is easy to change save/restore section.

I do not know power asm. Maybe I will learn power9 in near future.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/xianyi/OpenBLAS/issues/1263#issuecomment-331585630, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AdOMJEKTVqX_iSZVwQsyyxtaOHGlqRmYks5slEncgaJpZM4OqeoI.


This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

martin-frbg commented 7 years ago

I think I have the first one fixed, will need someone with real power8 hardware for testing however once I have updated the other files - I'm working off a qemu vm (which is s-l-o-o-o-w of course).

quickwritereader commented 7 years ago

@Estoltz thanks for clarifying

quickwritereader commented 7 years ago

@martin-frbg I dont know exactly but I think there is ibm power8 cloud . You could also ask Mr @edelsohn

edelsohn commented 7 years ago

There are a number of clouds with Power systems around the world.

https://developer.ibm.com/linuxonpower/cloud-resources/

The systems at Oregon State University probably are the most flexible. Although the systems at FIT Brno may be more convenient to Freiburg.

martin-frbg commented 7 years ago

Initial PR uploaded - this passes the builtin tests and xianyi's BLAS-Tester on a qemu-emulated ppc64le. Not sure I can be certain that it does not scribble in the new vector save area on the stack during the computations though...

martin-frbg commented 6 years ago

This was closed automatically by the merge. Reopening in case there is something wrong with my changes.

quickwritereader commented 6 years ago

@martin-frbg it seems correct and your addition exactly match with the example function from ABI page. Actually we could also use example restore-save functions from ABI page (the ones which is using r12)

Furthemore, I think below could be done for improvement: Saving and restoring could be re-written more conventional way. (Keeping on mind (SP=SP-stack_size and we are sotring from bottom)) I think currently general registers were saved in the opposite order which could be changed ((Higher-numbered registers are saved at higher addresses)). And also vector registers could be saved before general ones
The stack size could be kept smaller (as functions are leaf). And I also noticed that for copy functions there is opportunity to not use nonvolatile vector registers. I will re-check again. Just wanted to share to get opinions ABI page

martin-frbg commented 6 years ago

My additions were copied from the ABI example, my worry was (and is) that I may have somehow omitted to make space for these registers on the stack, or should have adjusted some of the relative addressing in the code (so that the net result would still be that the contents of these registers get overwritten). Mind you I saw no evidence of this in the behaviour of the test cases, else I would not have committed my changes. My last experience with assembly dates back to early student days however.