cosmos72 / stmx

High performance Transactional Memory for Common Lisp
http://stmx.org/
241 stars 14 forks source link

Interoperability with lparallel? [was: SBCL: stmx.lang:*current-thread* seems to never be rebound] #26

Open leetwinski opened 3 years ago

leetwinski commented 3 years ago

It seems that stmx.lang:*current-thread* special variable is not rebound, breaking interop with lparallel.

Example:

(ql:quickload :stmx)
(ql:quickload :lparallel)

(setf lparallel:*kernel* (lparallel:make-kernel 4))

(lparallel:future
  (stmx:atomic 1))

raises condition:

* Unhandled SIMPLE-ERROR in thread #<SB-THREAD:THREAD "lparallel" RUNNING
                                    {100434F813}>:
  STMX internal error!
  stmx:*current-thread* contains a stale value:
   found #<SB-THREAD:THREAD "main thread" RUNNING {1004AB01F3}>
   expecting #<SB-THREAD:THREAD "lparallel" RUNNING {100434F813}>
Typical cause is:
   new threads were created with implementation-specific functions,
   as for example (sb-thread:make-thread) or (mp:process-run-function),
   that do not apply thread-local bindings stored in
   bordeaux-threads:*default-special-bindings*
Solution:
   use (bordeaux-threads:make-thread) instead.

the following fixes it:

(defmacro lpar-atomic (&body body)
  `(let ((stmx.lang:*current-thread* (lparallel.thread-util:current-thread)))
     (stmx:atomic ,@body)))

(lparallel:future
  (lpar-atomic 1))

though i'm not sure it is a correct approach. Accorging to sources, i seems that *current-thread* gets bound just once to (bt:current-thread)

SBCL version: 2.1.9
lparallel and stmx are from quicklisp 2021-08-07
leetwinski commented 3 years ago

Is it even a viable idea to use stmx with fixed-size thread pool, as those of lparallel?

cosmos72 commented 2 years ago

Hi @leetwinski, STMX does not create threads, and is designed to work with threads created by the programmer or by some library. So yes, it can work with fixed-size thread pool, as those of lparallel.

There is a requirement, though: each thread that will access stmx variables, functions or macros must first set some thread-local bindings (in Lisp parlance, locally bind some special variables).

STMX stores the list of such bindings in the variable bordeaux-threads:*default-special-bindings*, which ensures that threads created with (bordeaux-threads:make-thread) will automatically have such local bindings.

If you create threads with some other mechanism, you need to set such local bindings yourself, for example by defining the following macro:

(defmacro with-initial-bindings (&body body)
    `(let ,(loop for pair in bordeaux-threads:*default-special-bindings*
               for name = (first pair)
               for bind = (rest pair)
               collect `(,name ,bind))
        ,@body))

and wrapping all code executed by a new thread inside it, as for example:

(with-initial-bindings
   (call-lparallel-thread-main-loop)) ; hypothetical function

The list of initial bindings depends on STMX version, and it currently includes local bindings for at least the following special variables:

stmx::*tlog-pool*
stmx::*hw-tlog-write-version*
stmx::*hide-tvars*
stmx::*record-to-tlogs*
stmx::*tlog*
stmx::*tvar-id*
stmx::*lv*
stmx.lang::*cons-pool*
stmx.lang:*current-thread*

as you may check by examining the value of bordeaux-threads:*default-special-bindings*.

The macro (with-initial-bindings ...) above is guaranteed to work for any STMX version, and it will break only if bordeaux-threads undergoes incompatibile API changes.

P.S. I recommend calling (with-initial-bindings ...) from outside any loop - even outside any implicit/hidden loop performed by lparallel threads that repeatedly call user code - because establishing these local bindings is relatively slow: it involves creating several non-trivial objects.

cosmos72 commented 2 years ago

[UPDATE] after a quick glance, it seems lparallel does not tell whether it is based upon bordeaux-threads or not.

If you are lucky, all you need is to replace any call to (lparallel:make-kernel worker-count) with

(lparallel:make-kernel worker-count :bindings bordeaux-threads:default-special-bindings)

[FURTHER UPDATE] bordeaux-threads:*default-special-bindings* has a different content from what (lparallel:make-kernel N :bindings ...) expects, you need to convert between the two:

A conversion function is not difficult, please reply here if you need help to implement it.

leetwinski commented 2 years ago

Hi @cosmos72 Thank you for such a detailed response! I will try to play with the approaches you've proposed, and see what i can get.

Anyways this one is very helpful!

And thanks for stmx by the way! :100:

cosmos72 commented 2 years ago

Good :) Thinking further about it, the :bindings optional argument to (lparallel:make-kernel) has quite different semantics from what STMX expects, and cannot be coerced to do what we need.

In detail:

The bindings argument passed to (make-kernel :bindings ...) is an alist (symbol . value) that will be used to create identical local bindings in all threads internally created by (make-kernel).

This is not what STMX expects: the alist (symbol . form) contained in bordeaux-threads:*default-special-bindings* must be separately evaluated in each thread created, because those forms instantiate different per-thread objects (which are then locally bound to special variables).

So the only immediate solution is to use the macro (with-initial-bindings) I described above.

In the long term, it may be better to ask lparallel authors to improve compatibility with bordeaux-threads,

leetwinski commented 2 years ago

@cosmos72 thanks again!

I decided to make this little patch in lparallel repo. It seems to fix my problem.

So i would probably use the fork, untill (unless) they merge it.

Once again: thank you for your guidance!

cosmos72 commented 2 years ago

So you found that lparallel creates threads with (bordeaux-threads:make-threads) after all, but only after rebinding *bordeaux-threads:*default-special-bindings* to a new value that does not contain the old value.

Yes, that definitely was the problem - and your fix looks correct to me :+1: