carp-lang / Carp

A statically typed lisp, without a GC, for real-time applications.
Apache License 2.0
5.52k stars 175 forks source link

What are the plans for threading / concurrency ? #1149

Open aliraza-noon opened 3 years ago

aliraza-noon commented 3 years ago

Hello there, really interesting project I stumbled on it through reddit post wondering what do you have in mind for concurrency and parallelism.

aliraza-noon commented 3 years ago

@eriksvedang that is really rude I guess you could have bothered to answer this with few words

eriksvedang commented 3 years ago

I'm very sorry, your account was almost empty and your comment quite generic, so I just assumed you were a bot.

eriksvedang commented 3 years ago

@aliraza-noon There is not a good and well though-out plan for threading and concurrency, I'm afraid.

aliraza-noon commented 3 years ago

ah ok I see I use this account for work.

In my previous life I have been somewhat involved in clojure community, my 2 cents maybe you can look at core.async which originated from go

eriksvedang commented 3 years ago

@aliraza-noon I see, sorry again. Yeah, I'd be very happy with something like core.async / Go! There has been experiments made with co-routines in Carp (though I can't find it now). Perhaps we can build something on top of that. Most likely anything like this will start as a library and we can add it to the sdtlib later.

hellerve commented 3 years ago

Might the experiment have been gt.carp? It implements a system for cooperative coroutines connected with channels. Sadly the system isn’t quite sound yet, because the borrow checker cannot figure out when the function ends (since the system uses assembly to context switch, even in the middle of a function, sometimes forever).

hellerve commented 3 years ago

A little point for the future: I’d suggest asking question in the chat rather than opening an issue, just because questions generate a bit of noise for the maintainers. This is not a big issue, but we use the issue tracker to track problems and feature requests for the language. I suppose that this issue could be read as a feature request, but we try to keep it actionable by writing more comprehensive RFCs (see these as examples: #1115, #1095, #884).

TimDeve commented 3 years ago

I have written a small demo of using pthreads, it's not incredibly safe: https://github.com/TimDeve/pthread

mw66 commented 3 years ago

Hope threading and concurrency will be added soon to the language, or stdlib first.

Nowadays, any serious app need processing power on multi-cores.

TimDeve commented 3 years ago

@eriksvedang Correct me if I'm wrong but the borrow checker can't really handle threads safely at the moment so there would need to be some language change before something like that can be added to stdlib.

hellerve commented 3 years ago

The thing I proposed informally before is allowing for a special call that cleans up all references in a function before that function ends (basically the after-function cleanup, but as an explicit call). This would allow at least gt.carp to work, because the scope of a thread is always a Carp function.

eriksvedang commented 3 years ago

@eriksvedang Correct me if I'm wrong but the borrow checker can't really handle threads safely at the moment so there would need to be some language change before something like that can be added to stdlib.

Yeah, I think that is correct. Now – it might not need that much changes to work. Definitely worth discussing more!

TimDeve commented 3 years ago

@hellerve I don't think it help with the issue I have in https://github.com/TimDeve/pthread, if you look at the example you have to be careful that the lifetime of your lambdas is longer than it takes for all your threads to finish, if the example wasn't joining before the lambdas are cleaned-up by the borrow checker you would get and use-after-free. We would need lambda consumer to be aware of the lifetime of whatever the lambda captured.

Unsafe example (seg-faults):

(load "git@github.com:TimDeve/pthread@v0.1.1")

(use-all Result Array)

(defn make-threads []
  (let-do [a-str @"A string"
           callback &(fn [] (do (System.sleep-seconds 1) (println* &a-str)))]
    (repeat 1000 &(fn [] (unsafe-from-success (Thread.create callback))))))

(defn main []
  (let-do [threads (make-threads)]
    (ignore
      (Thread.join-all threads))))
hellerve commented 3 years ago

This example feels a little wrong to me generally: I feel like the thread shouldn’t have a reference to the lambda here but own it (would mean to require copying the fn, but that makes sense to me, since otherwise the threads would share that memory). This way you ensure that the function only gets deleted when the thread gets deleted.

TimDeve commented 3 years ago

Even if you make it take ownership, maybe putting your callback in struct that's returned with your thread handle, you'll still have problems with the ref it captures, like this:

(deftype Holder [f (Fn [] ())])

(defn make-thread []
   (let [r &(String.concat &[@"bro" @"ken"])
         ff (fn [] (println* r))]
     (Holder ff)))

(defn main []
  (let [t (make-thread)]
    (~(Holder.f &t))))

Output on my machine:

H]D

There needs to be a lifetime attached to lambdas that capture refs.

eriksvedang commented 3 years ago

There needs to be a lifetime attached to lambdas that capture refs.

I think they have a lifetime, right? It's not checked properly though. I think we have an issue for this... just never got around to fix it.

TimDeve commented 3 years ago

Is the lifetime linked to the references it captures?

eriksvedang commented 3 years ago

Is the lifetime linked to the references it captures?

That's the idea, though there is something missing for a certain situation iirc.

redbar0n commented 2 years ago

Golang's way of dealing with concurrency seems to be the gold standard nowadays. Interestingly, Go doesn't presume the programmer will deal with threads directly (which seems a wise choice, imho):

Go blocks run your processes on a thread pool that contains a number of threads equal to two plus the number of cores on your machine, which means your program doesn’t have to create a new thread for each process. This often results in better performance because you avoid the overhead associated with creating threads.

https://www.braveclojure.com/core-async/

The downside is that it puts some extra weight on the shoulders of the language runtime. Namely a scheduler.

Go’s concurrency is also famous because of the clever scheduler it uses. Go programs are started with a thread pool that has a fixed number of threads as well as an asynchronous event loop. When a goroutine hits IO, it will sleep in the event loop, and when it’s ready, it will be scheduled to run in one of the threads. This means computationally intensive tasks will run in parallel on the CPU, while IO-bound tasks won’t take up a thread.

https://qr.ae/pGBJkT

The issue with the go keyword + channels in Go, is that they don't afford simple local reasoning, black box abstraction, automatic error propagation, and reliable resource cleanup (according to this article). In short: they are a bit too fine grained not contained, and need to be manually maintained. They might work well in imperative C-inspired languages such as Golang, but maybe not so much for languages aspiring to be more declarative and functional in style, such as Carp. Even though Clojure has their own way of dealing with them in core.async, they rely on an (imho unelegant) imperative close! keyword, which doesn't solve the problem of manual management and reliable resource cleanup.

So, another idea could be to build in and enforce something like a fork/join construct, similar to Golang's WaitGroup. But more like nurseries in python's Trio library for concurrency, since nurseries allegedly alleviate the aforementioned problems. Basically, nurseries are a contained structure that will block the calling process until all the expressions within (which all execute concurrently) return. They afford the programmer with simple local reasoning and sequential reading of the program (as opposed to callbacks that fragments the logic and control flow out into various handlers). Plus, it shouldn't matter that the original process is blocked (waiting for their result), since the runtime will utilize any available cores for the forked processes, not letting resources go to waste.

(Update: For web servers, the forked processes may be remote async requests, thus not consuming resources on the server cores. Isn't it then a problem that the top level process is blocked while waiting for the results of those requests? It shouldn't be, if the top level process is simply a goroutine aka. deep coroutine that the scheduler can swap out for one or more top level process that takes care of new requests in the meanwhile).

For reference, here is the hacker news criticism of the aforementioned article on nurseries in Trio.

Without having thought thoroughly about it and explored the ramifications, I imagine such a nursery aka. fork/join construct could work well with borrowing checking and enable lifetime management, which could solve some problems mentioned previously in this discussion:

cooperative coroutines connected with channels ... Sadly the system isn’t quite sound yet, because the borrow checker cannot figure out when the function ends

the borrow checker can't really handle threads safely at the moment

you have to be careful that the lifetime of your lambdas is longer than it takes for all your threads to finish

Since the construct will have a demarcated end point where all the forked processes will be done (since the main process is waiting for that).

But nurseries, as far as I know, work by mutating external variables (the trio docs corroborate that impression). But in a functional style we'd like the forked processes to work as expressions, returning a value.

One issue with fork/join constructs is: how to order the results of each forked process at the moment they have completed and are supposed to be joined? An idea that might work here, is to automatically order the result of the forked processes according to their appearance in the calling code (the sequence in which they were forked).

Update 2023-04-05, to add this Guy Steele quote from a relevant presentation I came across:

"So, to conclude, I believe that a program organized according to linear problem decomposition principles of which the accumulator design pattern is a significant hallmark, those sorts of programs can be really hard to parallelize. A program organized according to independence, having a programming language that makes it easy to say which computations are independent, and being able to talk about combining operations in which grouping doesn't matter, these divide and conquer principles make for code that is easily run, either in parallel or sequentially, according to what resources are available at runtime." -- Guy Steele in How to Think about Parallel Programming: Not!

Interested to hear what you think about this overall approach.

dumblob commented 2 years ago

@redbar0n the link "fragments the logic" points to a video which is unavailable:

Video unavailable This video is no longer available due to a copyright claim by Alexander Miller

redbar0n commented 2 years ago

@redbar0n the link "fragments the logic" points to a video which is unavailable:

Thanks. Fixed the link now.

eriksvedang commented 2 years ago

@redbar0n Thanks a lot for sharing your thoughts and research, that is super helpful!

redbar0n commented 1 year ago

This might be relevant reading: The Scoped Task trilemma (Apr 8, 2023).

Teaser excerpt:

You want to use the Rust type system to implement an API for concurrent programming. Conceptually, in this API you have a “parent task” and one or more “child tasks;” if there is more than one child task, they can run concurrently with one another. Any sound API can only provide at most two of the following three desirable properties:

  • Concurrency: Child tasks proceed concurrently with the parent.
  • Parallelizability: Child tasks can be made to proceed in parallel with the parent.
  • Borrowing: Child tasks can borrow data from the parent without synchronization.