et4te commented 10 years ago

At the moment the W parameter to the evaluator is a Pid. Pids are spawned sequentially and sorted and then used to evaluate sub-expressions.

This is wrong because at time 1 you may spawn N threads <0.10.0>, <0.13.0> ... At time 2 you can have N threads which are less than these <0.3.0> ...

This is because when an Erlang process terminates, the Pid Id gets reclaimed and may be re-used by the Erlang runtime.

What we need to do is replace this with a deterministic solution. E.g: in tthread instead of using W as the criteria for sorting processes, we could instead tag W with the position in the evaluation tree that it was spawned.

E.g: {[0], W0}, {[0,0], W1}, {[0,1], W2} ...

This was the original implementation but I changed it at some point prematurely optimising. ;) I didn't like the fact that the tags can grow reaaally big depending on the amount and depth of the thread tree. This is however the only way I have found to do it where the result ends up: a) Being deterministic b) Does not depend on an external counter

If you can think of a clever way of doing this without having to store arbitrarily large lists and still be able to tell when you get a thread whether one is a prefix of another, let me know.

fenollp commented 10 years ago

Well. You've git to store a tree. Store a counter. Update them in the same clause.

On 19 sept. 2013, at 23:55, Edward Tate notifications@github.com wrote:

At the moment the W parameter to the evaluator is a Pid. Pids are spawned sequentially and sorted and then used to evaluate sub-expressions.

This is wrong because at time 1 you may spawn N threads , ... At time 2 you can have N threads which are less than these ...

This is because when an Erlang process terminates, the Pid Id gets reclaimed and may be re-used by the Erlang runtime.

What we need to do is replace this with a deterministic solution. E.g: in tthread instead of using W as the criteria for sorting processes, we could instead tag W with the position in the evaluation tree that it was spawned.

E.g: {[0], W0}, {[0,0], W1}, {[0,1], W2} ...

This was the original implementation but I changed it at some point prematurely optimising. ;) I didn't like the fact that the tags can grow reaaally big depending on the amount and depth of the thread tree. This is however the only way I have found to do it where the result ends up: a) Being deterministic b) Does not depend on an external counter

If you can think of a clever way of doing this without having to store arbitrarily large lists and still be able to tell when you get a thread whether one is a prefix of another, let me know.

— Reply to this email directly or view it on GitHub.

et4te commented 10 years ago

Storing a counter doesn't work because you've got to be able to tell whether a thread is a prefix of another. A prefix does not mean is this number less than, it means is this thread less than and part of the same branch in the thread dependency graph.

fenollp commented 10 years ago

What about a list of integers? First in the list are the thread's predecessors and the thread`s number in the end. That should need a special tree-lookup function…

Cheers,

Pierre Fenoll

On 20 September 2013 01:42, Edward Tate notifications@github.com wrote:

Storing a counter doesn't work because you've got to be able to tell whether a thread is a prefix of another. A prefix does not mean is this number less than, it means is this thread less than and part of the same branch in the thread dependency graph.

— Reply to this email directly or view it on GitHubhttps://github.com/esl/tea/issues/21#issuecomment-24783673 .

lucafavatella commented 10 years ago

Ed wrote:

What we need to do is replace this with a deterministic solution. E.g: in tthread instead of using W as the criteria for sorting processes, we could instead tag W with the position in the evaluation tree that it was spawned.

Please note that the position in the evaluation tree is an arbitrarily large list as well.

lucafavatella commented 10 years ago

All the below is IMHO of course :)

In short - the best solution is defining the thread id as a list of integers of arbitrarily size. This solution can be optimized in order to mitigate its shortcoming (high memory consumption).

Introduction

Supporting the evaluation of expressions of arbitrarily size with full concurrency among subexpressions does not allow defining thread ids that have predefined limited size. In particular:

a. Limiting the size of the thread id, while at the same time not limiting the size of expressions that can be evaluated, would limit the number of subexpressions that can be computed concurrently. In order to still be able to compute expressions of arbitrarily size, generating threads ids would need to be strictly coupled among subexpressions, effectively introducing serialization;

b. Limiting the size of the thread id, while at the same time not explicitly limiting the number of subexpressions that can be computed concurrently, would limit the size of expressions that can be evaluated. This would be equivalent to defining the thread id as a list of integers of "arbitrarily" size, when the "predefined" size of the thread id is implicitly forced at runtime by the available amount of memory.

Solutions

The options I see are the following:

Store an arbitrarily large list as thread id (naive solution);
- This would lead to an implementation that is correct and applicable to expressions of arbitrary dimensions - until there is memory available :)
Limit the size of the thread id introducing serialization;
- Refer to (a) above;
Limit the size of the thread id without introducing serialization;
- Refer to (b) above;
- This is equivalent to option (1) but the size of the thread id predefined rather that forced by a memory exhaustion error/crash.

I prefer option 1.

It allows full concurrency with an implicit runtime limit on the thread id defined by an error/crash for memory exhaustion, that would mean that the expression may not be able to be evaluated concurrently;
If the evaluator shall commit to evaluate the expression regardless of the thread ids (so concurrency of the evaluation and memory consumption of thread ids are considered implementation details that shall not undermine the successful evaluation of the expression), the error/crash for memory exhaustion shall be detected, and the expression re-evaluated with a single-threaded evaluation strategy. BTW the single-threaded evaluation could still fail for memory exhaustion for other reasons, e.g. too much info needed in cache for evaluating subexpression without the possibility to garbage collect, or position in the evaluation tree kept for generating hidden dimensions - see note 1 at the bottom for this last point.

Optimizations/mitigations for naive solution

The naive solution (thread id as list of integers of arbitrary length) can be optimzed.

Optimization 1

Memory consumption in a naive implementation of the naive solution would grow exponentially with the number of threads. It can grow linearly using immutable data structures shared among threads. The optimization consists in defining each thread id w in terms of the parent thread's id w0, storing in w only a reference to w0 and not a its full copy. The assumptions for this solutions are the following:

Each thread can access the thread id of the parent thread. This involves some form of shared memory;
The reference to the thread id of a thread is not deallocated nor it changes, at least until all children died.
- Ed, Is each thread guaranteed to die only after all its children are dead?

Optimization 2

The optimization consists in using the minimum amount of memory for storing the identifier of the subexpression. E.g. thread spawned for subexpression 2 shall only use two additional bits (2 in base 10 = 10 in base 2), not full bytes (e.g. char in C is a byte, it is a waste of memory for storing number 2).

Notes

Note 1: Similar discussion/optimization could be done for the position in the evaluation tree kept for generating unique hidden dimensions (pull request to come).

et4te commented 10 years ago

Whew that was a lot to read. :) What I had in mind was indeed using a list (naive) solution to start with, basically to revert to what was a better prior implementation. The memory wouldn't grow exponentially based on N threads but would grow according to the thread tree, which could be exponential. ;) Then again in such a case memory will grow exponentially anyway, but here we'd be adding a list as well which is rather memory intensive.

I definitely think we need the naive implementation first.

For your optimisation number 1, the problem is that if we do this type of optimisation, when we later need to figure out if a thread W is a prefix of another thread, we need to traverse the entire tree for both Ws being compared.

esl / ice

Thread W is not deterministic #21

Cheers,

Introduction

Solutions

Optimizations/mitigations for naive solution

Optimization 1

Optimization 2

Notes