cosmos72 / stmx

High performance Transactional Memory for Common Lisp
http://stmx.org/
241 stars 14 forks source link

Fix compilation on SBCL 1.4.10+ #15

Closed AeroNotix closed 6 years ago

AeroNotix commented 6 years ago

Not sure exactly how to deal with depending on internal sb-* symbols, as stmx is obviously implementing some very cool functionality and implementing it as close to the hardware as possible such that it really, genuinely does benefit from using these internal interfaces.

After a quick discussion in #sbcl on freenode, it seems there isn't much consensus on how to actually deal with this. As an example, a user mentioned in #sbcl that there are many quicklisp packages that are depending on internal sb-* interfaces.

quicklisp commented 6 years ago

Does this also work on older SBCLs?

AeroNotix commented 6 years ago

@quicklisp No, the function needed in older SBCL versions is as the code is currently in the stmx repository.

Not sure how typically a single CL codebase handles multiple versions of different implementations, is there a *features* flag for SBCL versions?

quicklisp commented 6 years ago

Version checking isn't a great way to do it - feature checking and adding to *features* is better. Xof wrote a paper on the topic -- http://jcsu.jesus.cam.ac.uk/~csr21/features.pdf

AeroNotix commented 6 years ago

Well the issue is that SBCL itself would be the one to export a feature that indicates this, right? I completely agree that explicity version checking over time becomes unmanageable but aside from SBCL adding to *features* whenever an undocumented, unexported and entirely internal symbol changes I don't know what else could be done.

Bear in mind, STMX relies heavily on internal SBCL interfaces. I appreciate that this PR breaks stmx for older SBCL versions but not including this PR breaks STMX for what could be all future versions of SBCL.

quicklisp commented 6 years ago

No, it's normal for software to extend *features* to manage how things are compiled by testing for the presence or absence of things in the implementation. The paper goes into detail.

AeroNotix commented 6 years ago

The exported functionality of stmx doesn't warrant the need of adding anything to *features* imho. The functionality doesn't change at all depending on which version/symbol is used. I think what I can do is just test for the existence of either symbol and use the correct one at compile time, without adding to *features*. Users don't need to know that anything has happened because they won't already be directly be depending on the fact stmx is using one symbol or the other.

Would that work for you?

quicklisp commented 6 years ago

Sure, sounds good to me. I don't really care much. I don't use stmx. But in general, when it's not odious, I think it's better to keep both forward and backward compatibility.

AeroNotix commented 6 years ago

I understand the want to do that, I do. I'll amend the PR to achieve that.

The issue for me here is that because stmx literally only works on sbcl and it is implemented in terms of deep internal parts of sbcl that are undocumented , i.e. it is using the native code generators to output very specific opcodes to expose hardware transactional memory that I believe trying to maintain backwards compatibility is more work than it's worth.

SBCL can change these interfaces at will (as we have seen here). The SBCL developers (at least with a cursory discussion) don't seem to think it's worth wrapping these APIs up into something public would be worthwhile. Perhaps work involved would be too great, it would mean the internal native code generators need to remain stable.

Therefore, is a little backwards incompatibility going to hurt? Currently stxm is already broken in quicklisp for SBCL 1.4.10, and there is little to suggest that SBCL would either intentionally or unintentionally make changes that improve this situation.

quicklisp commented 6 years ago

It will hurt anyone who hasn't upgraded yet.

cosmos72 commented 6 years ago

Thanks a lot for finding the required change and preparing a patch!

If the only required change is to use (sb-assem::%emit-skip ...) instead of (sb-vm::emit-skip ...), compatibility with both future and past versions of sbcl can be achieved by testing for those functions at read-time with something like #+#.(cl:if (cl:find-symbol '%emit-skip 'sb-assem) '(cl:and) '(cl:or))

Using (fboundp) would be even better, but it's tricky to use correctly: (fboundp 'sb-assem::%emit-skip) gives error if the symbol does not exist

AeroNotix commented 6 years ago

@cosmos72 looking into this some more I believe stmx needs a bit of a refactor.

For example, the entirety of x86-32,64-insts.lisp can be dropped (it's in SBCL).

cosmos72 commented 6 years ago

You are right, I forgot about that change - it was discussed on SBCL mailing list some time ago, when SBCL stopped exporting some internal functions to define new instruction. See https://sourceforge.net/p/sbcl/mailman/sbcl-devel/thread/56DDF10C.50401%40gmail.com/#msg34914735

Maybe it means that, on SBCL 1.4.10+, STMX fails to detect that xbegin/xend/xtest/xabort CPU instructions are already defined by SBCL ?

cosmos72 commented 6 years ago

Confirmed, there is a function (stmx.asm:::compile-if-instruction-defined) used as follows in stmx/asm/x86-32,64-insts.lisp :

#-#.(stmx.asm::compile-if-instruction-defined 'xbegin)
(sb-vm::define-instruction xbegin (segment &optional where)
  ; ...
  (sb-vm::emit-skip segment 4 0)

So it means (stmx.asm::compile-if-instruction-defined) is no longer working on SBCL 1.4.10+, and stmx attempts (and fails) to define the CPU instructions xbegin, xend... even though recent SBCL versions already have those.

I hope to have a little time to work on it soon.

AeroNotix commented 6 years ago

No, it works on SBCL 1.4.10, I'm using it (though having issues with starting hardware transactions? My CPU has RTM support but begin-transaction always returns 0.) I just thought I'd mention that these are in SBCL in case you wanted to refactor out the duplicated code that's already in SBCL.

However, depending on if you want to support every SBCL version ever released, the compile-if-instruction-defined should and does work fine for that.

cosmos72 commented 6 years ago

From my tests, (stmx.asm::compile-if-instruction-defined 'xbegin) always returns nil on both SBCL 1.4.2 and 1.4.10

I fixed it in e54d768fd3e389041363993ffad18747f34469ed because the CPU instructions xbegin, xend... are now predefined in SBCL as you correctly pointed out, and there is no need to redefine them.

This is because SBCL maintainers stated two years ago (see https://sourceforge.net/p/sbcl/mailman/sbcl-devel/thread/56DDF10C.50401%40gmail.com/#msg34911816) that "Defining new instructions [in user code] was never really supported though."

So, while your PR fixes calls to (sb-vm::define-instruction) to work on SBCL 1.4.10 - but it's still not really supported, according to SBCL maintainers,

my change e54d768fd3e389041363993ffad18747f34469ed fixes stmx to detect that there is no need to call (sb-vm::define-instruction)

AeroNotix commented 6 years ago

:+1:

cosmos72 commented 6 years ago

About (begin-transaction) always returning 0: some instructions cannot be executed in a hardware transaction and will always cause it to abort - for example context switches and input/output.

Thus, calls to (stmx.lang:hw-transaction-begin) must always be paired to calls to (stmx.lang:hw-transaction-end). Example:

(let ((works nil))
  (when (eql (stmx.lang:hw-transaction-begin) stmx.lang:+hw-transaction-started+)
    (setf works t)
    (stmx.lang:hw-transaction-end))
  works)

So, for example you can forget debugging code executed in a hardware transaction... the breakpoint interrupt will cause the transaction to abort. Also, consing memory with current SBCL memory allocator causes them to abort.

P.S. please also check that stmx detected RTM instructions - several CPU models have them, but in some CPUs they are bugged and have been disabled by Intel firmware updates. You can check this with:

stmx.lang:+hw-transaction-supported+
AeroNotix commented 6 years ago

I have support for RTM instructions for sure. I wasn't aware you needed to pair the hw-transaction-begin with a hw-transaction-end. How do CPUs detect that the instructions aren't paired or is that something that STMX does?

Thank you for your example, now I seemingly can use the hw-transaction-begin!

Can you explain why this doesn't work for me, please: https://gist.github.com/AeroNotix/31897c8714e53777c29e9713376fda8d

cosmos72 commented 6 years ago

CPUs do not detect that xbegin and xend are not paired: it's just that if there is no xend near enough to xbegin, in practice you try to execute a runaway transaction - sooner or later it will abort for some reason: context switch, input/output, interrupts (including timer interrupts), overflow L1 cache with speculative memory reads and writes, etc.

Also, consider that stmx detection of RTM instructions may be bugged (unlikely but possible) as I could only ever test it on my Core i7 4770

AeroNotix commented 6 years ago

Right, I think I understand. Can the CPU detect what instructions are within the transaction and abort prior to starting the transaction? I notice with printf (I know you're not supposed to execute I/O within transactions, but I wasn't aware transactions will abort if they contain them!) it will abort always, however, this seems to work:

#include <immintrin.h>
#include "stdio.h"

int main(void) {
    int i = 0;
    unsigned transaction_started = _xbegin();
    i++;
    _xend();
    printf("%d\n", transaction_started == _XBEGIN_STARTED);
}
cosmos72 commented 6 years ago

That's (almost) the correct way to use hw transactions - you should call _xend() only if _xbegin() succeeded. They are elusive creatures... some operations will always abort them, others may abort them (and in practice they always seem to). The full list is very long and contained in Intel Reference Manual.

And no, CPUs cannot detect the instructions in a transaction before executing it - again, after xbegin it's just all speculative execution until an xend is found. If an abort occurs for any reason before reaching xend, all the speculative execution is rolled back and the transaction simply appears to have failed, without side effects.

AeroNotix commented 6 years ago

What I still don't understand is, how on earth can xbegin return 0 (transaction not started) when I have printf statements within the block?

Is the block speculatively executed and if any non-whitelisted instructions are found it rolls back to the initial xbegin instruction and returns from that point?

cosmos72 commented 6 years ago

exactly. Actually, _xbegin() supposed to return some flags that explain the abort reason, but 0 is one of them.

AeroNotix commented 6 years ago

I am extremely amazed. Modern CPUs are magical!

Thank you so much for the guidance and information, you have put an end to a two day search for information. The intel documentation provides one tenth of the information you have shared here. Thank you again!

cosmos72 commented 6 years ago

I was quite surprised too that xbegin basically returns twice: once speculatively at transaction start, and again (with a different return value!) if the transaction aborts and execution is resumed non-speculatively at xbegin (the xbegin CPU instruction allows these two returns to be at different code addresses, like a jump, but in practice higher-level bindings always use the same address for both returns)

One last consideration: since transactions are speculative execution, it becomes clear that it's not possible to observe them while they are in progress - from outside, you can see their effects only after they finished (or aborted).

In this, they are analogous to transactions on databases

AeroNotix commented 6 years ago

I am going to have so much fun with this!

cosmos72 commented 6 years ago

Update: on SBCL 1.4.10, allocating memory inside a hw transaction sometimes works:

cl> (in-package :stmx.asm)
asm> (defun hello-world ()
       (let ((ret (transaction-begin)))
         (when (= ret +transaction-started+)
           (setf ret (make-hash-table))
           (transaction-end))
         ret))
HELLO-WORLD
asm> (hello-world)
6
asm> (hello-world)
#<HASH-TABLE :TEST EQL :COUNT 0 {100C376C33}>
asm> (hello-world)
#<HASH-TABLE :TEST EQL :COUNT 0 {100C377B03}>
asm> (hello-world)
#<HASH-TABLE :TEST EQL :COUNT 0 {100C3789D3}>
asm> (hello-world)
#<HASH-TABLE :TEST EQL :COUNT 0 {100C3798A3}>
asm> (hello-world)
#<HASH-TABLE :TEST EQL :COUNT 0 {100C37A773}>
asm> (hello-world)
#<HASH-TABLE :TEST EQL :COUNT 0 {100C37BFB3}>
asm> (hello-world)
6
asm> (hello-world)
#<HASH-TABLE :TEST EQL :COUNT 0 {100C37D713}>
AeroNotix commented 6 years ago

I notice these single threaded transaction failures as well. Any idea what causes it? I see it not only in SBCL but also in C programs.

cosmos72 commented 6 years ago

Welcome to the dark side of RTM hardware transactions: they are never guaranteed to succeed - the best you can get is something like 99.9999% of successes, and the percentage decreases the longer and more complex they are.

As stated in Intel Restricted Transactional Memory Overview:

"A processor may abort RTM transactional execution for many reasons"

"Programmers must always provide an alternative code sequence in the fallback path to guarantee the code completes execution. This may be as simple as acquiring a lock and executing the specified code region non-transactionally. Further, a transaction that always aborts on a given implementation may complete transactionally on a future implementation. Therefore, programmers must ensure the code paths for the transactional region and the alternative code sequence are functionally tested."