HuoLanguage / huo

interpreted language written in C
MIT License
212 stars 21 forks source link

Parallel execution not threadsafe #45

Open TheLoneWolfling opened 8 years ago

TheLoneWolfling commented 8 years ago

We need to do locking or something similar. This could be slow, though.

As-is, things could get rather messed up if two parallel threads, say, both tried to call a function at the same time. Or modified the same variable at the same time.

hash_table is not threadsafe.

incrediblesound commented 8 years ago

OK, this is a tricky one, I've been thinking about it a bit. I don't like the idea of slowing down the parallel function with locking, but at the same time trying to detect an error such as "you are passing the same variable into two functions inside parallel" would also be clunky.

Then I thought of the idea of having two parallel functions, one with locking and one without. I'm still not sure about this, it might be overkill to have a function called safe_parallel and another called unsafe_parallel, but I'll keep thinking about it.

TheLoneWolfling commented 8 years ago

That wouldn't work either, as it's implemented currently.

We're not copying the call stack per-thread, either.

incrediblesound commented 8 years ago

Can you explain more exactly what wouldn't work? I've tested the parallel function on a number of cases and it seems to work fine unless you make multiple threads access the same variable.

incrediblesound commented 8 years ago

So after doing some research it seems to be the case that each thread is its own "context of execution" and has its own stack. That being the case it shouldn't matter if we are executing the same function in two or more stacks unless they access the same heap memory. That means we need to make sure that our malloc is threadsafe and we also either throw an error, add locking, or add a memory-safe alternative to the parallel function to handle the case of a single variable in multiple threads. Does that sound right @jeremycarroll ?

jeremycarroll commented 8 years ago

I would suggest running some parallel tests to explore this issue: (let x 0) (let y []) (parallel (for 0 100 (let x (+ x 1))) (for 0 100 (let x (+ x -1))) (for 0 100 (let x (+ x 2))) (for 0 100 (let x (+ x -2))) ) Then x should be 0, but won't be if you have a thread-safety issue. Similarly have a test with parallel calls to memory allocation to test the safety of the malloc code.

incrediblesound commented 8 years ago

OK So after chatting with Jeremy we determined that the only issues for thread safety are accessing shared memory (as in the test he wrote above) and the thread-safety of malloc.

As for the former issue, I still like the idea of a fast unsafe parallel, but Jeremy suggested that a sophisticated approach would be to make parallel unsafe and then provide a locking mechanism to the user so that they can control the granularity of the mutex. I'm not sure how easy this would be to implement but I'm imagining something like:

(parallel
  (lock (let x (+ x 1)))
  (lock (let x (- x 1)))
)
TheLoneWolfling commented 8 years ago

You're confusing the C stack and the Execution_bundle huo scope stack.

The C stack is, of course, per pthread. But the stack as implemented in Execution_bundle is not.

As per " the only issues for thread safety are accessing shared memory": right, and currently the scope is shared memory.

Something as simple as this:

(parallel
    (let y 1)
    (let x 2)
)

could easily crash the interpreter, currently. Or end up with y undefined, or x undefined, or x defined as 1, or y defined as 2, or...

It's only happenstance and race conditions that mean that it doesn't right now.

(Look at what happens. Both threads try to call execute->apply_execution_function->store_let_binding->store_let_value->hash_table_put possibly-simultaneously...)

And it's not just "let" that's the problem, either. It's everything that touches the shared scope in a non-readonly manner.

incrediblesound commented 8 years ago

Ahhhh OK I see what you're saying. Yes, OK that is true. So the issue is that the resources used for execution (the scopes and the defined functions) are going to be accessed by both threads, and it's not quite as simple as just saying "don't pass in the same variable to two functions in parallel" because any accessing of those resources is dangerous.

So it seems to me that there are two options:

  1. copy the resources and then merge them after parallel is finished
  2. use locking on the resources

Option one just sounds nasty. I am going to do more research on using mutexes because I find this problem interesting. It might be worth it to try both approaches on branches and profile their performance to see how they compare, unless you think option one is too heinous to even bother.

TheLoneWolfling commented 8 years ago

A third option:

Modify the scopes struct to have a pointer to a parent scopes. When accessing through the parent scopes, use locking, otherwise just go for it. Then create a per-thread scopes with parent pointer to the main scopes.

You'll need a read/write lock. (In other words, either any number of threads can be reading, or one thread can be writing. )

Personally? Go with the simple option for now, and open a bug examine parallel performance to take a look at later.

By the way: it might be better to replace the current parallel with a parallel_map. You can return a value through the second argument of thread_join. That way you (probably) won't need to lock in the common case.

TheLoneWolfling commented 8 years ago

A fourth option: enforce that all access to parent scopes is readonly.