Propagating context information to child-tasks and remote calls?

oschulz commented 4 years ago

@tkf and me have been discussing ways to propagate information about available workers (or resources in general) in distributed hierarchical computations:

https://discourse.julialang.org/t/propagation-of-available-assigned-worker-ids-in-hierarchical-computations

Adding resource arguments to every function call would be impractical, and using Cassette & friends to add add context by rewriting the whole computation would be very heavy-handed, since it might tough large code stacks (and may also be not be a complete solution when remote calls are involved?).

Could we add something like a context field to Task, in addition to the storage field - with the difference that context is automatically propagated to spawned tasks and remote calls? Adding the possibility to propagate a context through a computation in this fashion could also be useful for other use cases too, I think.

vtjnash commented 4 years ago

C.f. #34543

tkf commented 4 years ago

Thanks for opening the issue.

I'm actually against copying task_local_storage API. I think something similar to contextvars in Python (see PEP 567 for discussion) would be better:

const var1 = ContextVar(:var1, 42)  # with default
@contextvar var1 = 42 # possible sugar

const var2 = ContextVar{Lockable{Vector{Int}}}(:var2)  # without default, with type
@contextvar var2::Lockable{Vector{Int}} # possible sugar
# Lockable from https://github.com/JuliaLang/julia/pull/34400

const var3 = ContextVar(:var3)  # equivalent to ContextVar{Any}(:var3)
@contextvar var3 # possible sugar

function f()
    @sync @async @show var1[] # var1[] => 42
    var1[] = 0
    @sync @async @show var1[] # var1[] => 0

    setting(var1 => 1) do
        @show var1[] # var1[] => 1
    end
    @show var1[] # var1[] => 0
end

In contrast to task_local_storage(:var):

var[] can be inferred
var is forced to be namespaced (i.e., it has to exist in some module name space)
var can be backed up by an efficient concrete key type (e.g., UUID)
var allows small-size optimization when the value type can be inlined into the context storage

We may want to use something like HAMT (as in PEP 567) for the context storage.

A possible/reference implementation of ContextVar may be something like

struct ContextVar{T}
    name::Symbol  # only for human readability
    key::UUID     # (for example)
    default::T    # or `Union{Some{T},Nothing}`?
    has_default::Bool
end

getindex(var::ContextVar{T}) where {T} =
    if var.has_default
        get(task_local_storage(:CONTEXT), var.key, var.default)
    else
        task_local_storage(:CONTEXT)[var.key]
    end :: T

tkf commented 4 years ago

Here is a proof-of-concept and its usage: https://github.com/tkf/ContextVariables.jl/blob/ebc2b550e2177c4ebc059b6974f1ea36309d7f4f/test/runtests.jl#L14-L29

oschulz commented 4 years ago

Oh, nice! Will this work across remote-call boundaries?

Maybe use with_context instead of with_variables? Variable is such a generic term. :-)

tkf commented 4 years ago

Yeah, with_context sounds like a much better name. :)

Will this work across remote-call boundaries?

I think it'd work with local context variables (i.e., captured in closures) but not with global const context variables. But I think it's possible in principle. We need to tweak Distributed (or you need to propagate it manually) though.

Notes: Global const doesn't work because I'm using uuid4 (which is based on the global RNG) to generate the key. I can use uuid5 to make the key deterministic (i.e., hashing the package's UUID, module namespaces, and the variable name). We need a macro to make this easier.

c42f commented 4 years ago

Yes we need something better here, thanks again for opening this issue! I feel an alternative name for this issue could be "taking dynamic scope seriously" ;-) Though perhaps "context variable" is a more straightforward name for this.

If we somehow had efficient codegen for context variables, there's a lot of interesting language facilities which can be improved based on this. For example

It could replace the special cases for the logger state in the existing logging system
Error handling with an "algebraic effects" or "condition system" - see some discussion at #7026
Deterministic task local random number generators https://github.com/JuliaLang/julia/pull/34852 (see also https://discourse.julialang.org/t/parallel-mersenne-twister/27567). Interesting and very difficult wrinkle: serving the RNG use case requires running code at task creation time, not just efficient access.

At https://github.com/JuliaLang/julia/issues/35690#issuecomment-623904330 I wondered about one possible way to specialize code based on context variables. As noted in the OP, specializing a whole call stack based on the presence/type of a context variable is just really heavy handed. But on the other hand, I feel like specializing a leaf function (or a few innermost inlined frames) on a context var could be very powerful, and possibly of acceptable cost.

Getting this idea to be sound and practical within inference seems quite tricky. I guess it would require a context calling convention to allow inference to reason about the innermost frames in a systematic way. If it actually panned out, I'm imagining the compiler could hoist context variable access out of innermost loops and across inlined function calls.

Kotlin seems to have some interesting APIs around context and coroutines (and they have taken structured concurrency seriously!) so we might learn something from examining that closer https://kotlinlang.org/docs/reference/coroutines/coroutine-context-and-dispatchers.html

oschulz commented 4 years ago

Getting this idea to be sound and practical within inference seems quite tricky.

I guess it would depend on how deep we want to take it. Context attached to tasks could be done with just a few small, lightweight changes. I think that would cover many use cases already.

Context at a deeper level, propagating through all function calls will come with a higher cost, I guess - but there are quite a few uses cases that would need that, of course, though I think less concerned with scheduling and resource allocation questions.

tkf commented 4 years ago

@c42f What do you think about the design of ContextVar{T} I posted above? I think it solves the problem of inferrability. There is still a dynamic code inside getindex(::ContextVar{T}) but LICM'ing it (or maybe rather get(::ContextVar{T}) :: Union{Some{T}, Nothing} which won't throw) sounds much easier with the current infrastructure than implementing the new effect specialization facility. I think ContextVar{T} is a neat design that completely eliminates the type-instability on the user side of the code.

Also, I don't think it's possible to eliminate some dynamic typing. Since we need to allow arbitrary value types, and we can't specialize for the whole set of the context variables currently set (i.e., equivalent to using NamedTuple as the context), the valtype would be Any. So I think type-stabilizing at use-site is a good strategy.

(I wonder if it can be generalized to the strategy for implementing dynamically scoped effect handler. That is to say, type information is statically "hoisted out" to the global/lexical scope and then there is a tiny bits of run-time information invoking those pre-defined entrypoints.) (Edit: actually, I guess this happens automatically anyway in effect handlers like chain-of-custody error handling.)

I also thought about supporting the deterministic parallel RNG. It probably is possible via implementing some kind of hook system via AbstractContextVar. But I fear there will be too much dynamic dispatches on the task creation. Though it looks like Kotlin's context system allows user-defined restoreThreadContext/updateThreadContext? I wonder how they make it efficient. Is it doable because Kotlin is statically typed or something?

c42f commented 4 years ago

What do you think about the design of ContextVar{T} I posted above? I

Well, I like it! It's simple and can be implemented right away with only some small changes to the runtime. My comments above about effect systems, etc, are all pretty much pie-in-the-sky speculation at this stage, though it would be nice to have a rough feeling for where things might go.

So anyway, if we provided ContextVar, that at least gives us strongly typed task local storage which would be reasonably efficient, convenient and solve some problems with context propagation which are currently unsolvable for users. It also doesn't seem too far-fetched to teach the Julia optimizer about the scoping semantics so that some form of LICM will work in the medium term. Specialization based on type of context vars could eventually be introduced as an optimization (if it's even feasible).

oschulz commented 4 years ago

I also thought about supporting the deterministic parallel RNG

That's actually something I already have in BAT.jl

https://github.com/bat/BAT.jl/blob/master/src/rngs/rng_init.jl

This provides reproducible random numbers for hierarchical parallel applications, via hierarchical partitioning of counter-based RNGs .

I'll make that more widely available via ParallelProcessingTools.jl. It's just in BAT.jl currently because I needed it fast and didn't have time to do a nice API at the time. I'll get on it.

oschulz commented 4 years ago

Context variables would be super-useful to propagate those RNG partitions!

JeffBezanson commented 4 years ago

I think the core problem is the cost of the key lookup. I believe it would be much too slow for something like RNG.

oschulz commented 4 years ago

I think the core problem is the cost of the key lookup. I believe it would be much too slow for something like RNG.

Well, I guess in many use cases, the task would get it's RNG once, and then pass it on to the functions it calls as an explicit parameter. And then pass partitions of that RNG to sub-tasks, when they spawn. I really wasn't trying to go for a full resource-injection solution with this issue, just something on the task-level for low/medium-frequency access.

The RNG-story is a bit different from the worker-availability story, though. Which RNG is used for which part of the computation shouldn't depend on number or workers/threads/tasks, for reproducibility, so that often may need to be handled explicitly.

tkf commented 4 years ago

My comments above about effect systems, etc, are all pretty much pie-in-the-sky speculation at this stage, though it would be nice to have a rough feeling for where things might go.

@c42f Yeah, I do like pie-in-the-sky speculations and am very interested in effect systems! Also, I think context variable API helps implementing POC effect handlers, even though it won't be as efficient as we'd want. Some effect handlers don't require too much optimizations when the overhead of the handler itself is large (e.g., a process+thread pool abstracting Distributed and Threads). I think it'd be nice to have a building block for playing with effect handler interfaces and assessing programming experience with it, before start thinking about optimizing the hell out of it (though it's also important to think about compiler-friendly interface at the same time).

tkf commented 4 years ago

This provides reproducible random numbers for hierarchical parallel applications, via hierarchical partitioning of counter-based RNGs .

@oschulz It's awesome that you already have counter-based RNGs integrated into a parallel program! But don't you need to do something at @spawn? I guess your other comment confirms that?:

And then pass partitions of that RNG to sub-tasks, when they spawn.

If so, don't you need to create custom @spawn-like API anyway? I'm trying to understand how context variables can be useful in implementing deterministic parallel RNGs in "user land." I thought it'd be rather useless.

c42f commented 4 years ago

I think the core problem is the cost of the key lookup. I believe it would be much too slow for something like RNG.

Yeah. To make this fast enough I guess we need to be able to hoist the load of the ContextVar for the global RNG out of any inner loop, and also have a fixed concrete type for that RNG. (The fixed type is somewhat limiting, but no worse than what we have now.) I feel like it should be feasible to model dynamic scope in the optimizer if all setting and getting of ContextVars goes through a well defined interface? That could include eliding the ContextVar load for inlined frames where the innermost ContextVar store is visible. And even eliding the store if the frame is a leaf?

oschulz commented 4 years ago

@oschulz It's awesome that you already have counter-based RNGs integrated into a parallel program! But don't you need to do something at @spawn? I guess your other comment confirms that?:

Not for the RNG, no, because all functions I use take an RNG as an explicit argument. Since this is pretty much a standard in most of the Julia ecosystem, it's easy to do that consistently, also with the third-party packages I use. In the beginning, I also had explicit function arguments for resources like threads in BAT.jl, but that was unwieldy and I got rid it when partr came around.

Like I wrote, RNG distribution/partitioning can't always be automatic if the computation should be reproducible independing of parallel execution strategy. But it would still be great to have the option of propagating the RNG via context - user's don't always like to have to pass the RNG explicitly - but at points in between, algorithms that distribute computation will need to do some explicit RNG handling/partitioning.

tkf commented 4 years ago

But it would still be great to have the option of propagating the RNG via context

Hmm.... I'm not sure if I understand. Let's consider this snippet:

@contextvar state = 0

function demo()
    @sync begin
        @async println("at task1: ", state[])
        @async println("at task2: ", state[])
        state[] += 1
        println("at task0: ", state[])
    end
end

Then calling demo() will print

at task0: 1
at task1: 0
at task2: 0

in some order. If state were the state of RNG, task1 and task2 will have the same stream. That's not what you want, right?

task1 and task2 can call your functions that use RNG correctly. But, unless every use of @async/@spawn is aware of the RNG state that is in the task context, you can't be sure that you have "independent" streams.

tkf commented 4 years ago

I think there are two design questions:

(1) Should Distributed handle it automatically? That is to say, should context variable values be restricted to serializable values? It may be handy to put non-serializable states like files and locks in it so I don't think we should add this restriction. But this would make it impossible to implement automatic handling by Distributed. (Note that users still can implement context propagation by wrapping Distributed API and manually propagating known-to-be-safe context variables.)

A semi-automatic way may be to add an overloadable function API Distributed.is_serializable_context_value(x) :: Bool or something. This way, Distributed can copy the context for objects that are marked serializable-safe explicitly.

(2) Should it be possible to list context variables currently set? With the current design (only storing a mapping uuid => value in Task), it's not possible to get this. We can store a mapping uuid => (var, value) in Task or something equivalent to get the list but I'm not sure about the performance impact of this idea.

My preference is to add the context variable API without these features and see if we can get away without implementing them.

tkf commented 4 years ago

OK so here is a full set of API I'd propose https://tkf.github.io/ContextVariables.jl/dev/. It includes a quick tutorial.

oschulz commented 4 years ago

If state were the state of RNG, task1 and task2 will have the same stream. That's not what you want, right?

Oh, yes - we will of course, in general, need variants of @spawn, @remotecall, @async and so on that allow for specifying a partial new context (which should be merged with the current context before sending it on).

With my partitioned RNGs, the situations is a bit different, though. An RNGPartition can be passed on as it is, though, because the receiving end will instantiate the actual RNG (e.g. via AbstractRNG(rngpart::RNGPartition{R}, i::Integer)) it should use based on which part of the calculation it will handle next, not based on it's position in the worker/task hierarchy.

However, parts of the calculation that are not aware of RNGs (e.g. some 3rd party package), should pass on the RNG or RNG partition unchanged automatically, if possible.

Should Distributed handle it automatically?

I think yes.

The idea would be that a hierarchical calculation will, at certain points, need to control how resources (workers, RNG, etc.) are to be partitioned

c42f commented 4 years ago

OK so here is a full set of API I'd propose https://tkf.github.io/ContextVariables.jl/dev/. It includes a quick tutorial.

I haven't had time to read all the implementation yet, but the API looks nice. I particularly like the section on data races. I think you've nailed the correct semantics there: the runtime must ensure that the ContextVar get/set is data race free between Tasks, but the use of any context var values will need to be made threadsafe by the user.

For implementation of storage we could add an extra context field on Task, which, similar to logstate, is copied at task creation but is otherwise free to use in this work. Then store something like ImmutableDict in that context field. (Ideally logstate would eventually be removed and replaced with a ContextVar.)

Note that users still can implement context propagation by wrapping Distributed API and manually propagating known-to-be-safe context variables

This seems like it can be worked around by the application author but it's going to be ugly; in principle they can make a big list of ContextVars they care about. But it seems like libraries making use of ContextVar internally won't be able to compose with libraries using Distributed?

It's a tricky problem because in general remote calls can't know which of the context vars will actually be used. So should it be an error to have a contextvar installed which can't be serialized, but which won't be used on the remote side? I would think not. Perhaps it could be made to work by allowing the remote call and sending all context vars, but with the value of any non-serializable context var poisoned so that any use of it on the remote side will result in an exception.

Should it be possible to list context variables currently set

I agree we might be get away without this. The proposed API seems to be one where module authors are likely to create a ContextVar for their own internal purposes within a module. So if nobody else can list those it seems fine.

On the other hand, it would be really useful to be able to list the attached ContextVars during debugging. But for that a separate global lookup table based on uuid might suffice anyway?

tkf commented 4 years ago

The idea would be that a hierarchical calculation will, at certain points, need to control how resources (workers, RNG, etc.) are to be partitioned

@oschulz Yeah, I agree it'd be nice to be automatic for these cases. But, as I mentioned, it has some undesirable consequences. For example, some objects like files and locks can't cross process boundaries. It'd be problematic if a user accidentally put huge arrays in the context variable.

(Another way may be to use default_worker_pool to handle context variables that describe computation resources. Discussed below.)

For implementation of storage we could add an extra context field on Task, which, similar to logstate, is copied at task creation but is otherwise free to use in this work.

@c42f Yeah, I actually have already implemented it like this :) https://github.com/tkf/julia/commits/ctxvars

It's a tricky problem because in general remote calls can't know which of the context vars will actually be used.

One solution may be to make Distributed.default_worker_pool backed up by a context-local variable. That is to say, the process pool becomes a context-dependent thing (but still local to a process). This way, it's (probably) possible to implement context propagation across process boundaries by implementing a custom worker pool on user space. Then such custom worker pool can decide what context variables to propagate. I'm not sure if the worker and worker pool interface is designed to allow this, but, if we are to implement context propagation across processes, it sounds like a nice hook point to make it work. (I guess it's kind of like how LoggingExtras.jl builds composable stack on the user land.)

On the other hand, it would be really useful to be able to list the attached ContextVars during debugging. But for that a separate global lookup table based on uuid might suffice anyway?

Yeah, having an optional lookup table for debugging sounds like a good idea.

tkf commented 4 years ago

I opened PR #35833 to add this API so that it'd be easier to comment on implementation/design specific to the API I'm proposing.

oschulz commented 4 years ago

OK so here is a full set of API I'd propose

I'm not so sure that we should directly inject variables into the scope of the child-tasks. There may be name clashes and it's a potential security issue too - mainly I'm worried about name conflicts though. I think it would be better if context variables were retrieved with an explicit mechanism.

oschulz commented 4 years ago

Yeah, I agree it'd be nice to be automatic for these cases. But, as I mentioned, it has some undesirable consequences.

Hm, I would hope those problems can be overcome somehow - if only by the user being reasonable.

However, for the original scope of this issue - propagation of available workers - we'd most certainly want fully automatic propagation, so the the information is not lost when using packages that don't use the context-API to spawn local/remote tasks.

tkf commented 4 years ago

Thanks for having a look into the API docs.

There may be name clashes

Perhaps the documentation should be fixed to emphasize this but there will be no name clashes, by design. (Unless you can invoke a collision of UUIDs.)

However, for the original scope of this issue - propagation of available workers

The set of available workers is a very dynamic information and I don't think propagating it via "static" mechanism like context variable is a good idea. It would mean that a function doesn't get any update after it is called via @spawn or remotecall. I think a better approach is to create a worker pool abstraction [*1] and then propagate the currently active worker pool using the context variable.

This is why @c42f and I are discussing the effect system here. Task scheduler is a special case of the effect handler (https://github.com/JuliaLang/julia/issues/33248#issuecomment-572877801) and we need dynamic scoping to implement this (or rather a small subset of effect handler that is still enough for task scheduler). Context variable just provides dynamic scoping and some effect handlers can be implemented on top of it.

[*1] I guess it would need to implement something like the work-stealing approach on top of Threads and Distributed.

oschulz commented 4 years ago

There's actually precedent for task inheriting information from each other: Tasks do inherit logstate from their parent. I think it would make a lot of sense if tasks also had resources (should probably be a dict) that they inherit, and that the user can add content too.

oschulz commented 4 years ago

See https://github.com/JuliaLang/julia/blob/9a712ae0cedc7b5a8684ddf50988a0f23acecd9f/src/task.c#L579

tkf commented 4 years ago

Yes, that's how it's implemented in #35833.

oschulz commented 4 years ago

Oh, sure - I was mostly lending moral support - should have posted it on #35833.

tkf commented 4 years ago

Ah, sorry, I misread your comment. Yes, it'd be nice to expand what is done for loggers to something more general.

(Actually, your comment makes me realize that it would have been much easier to write the proof-of-concept using a custom logger than using generated function...)

c42f commented 4 years ago

Yeah, I'm responsible for adding that logstate thing :-) It's currently an odd special case, but really shouldn't be.

oschulz commented 4 years ago

Yes, logging is still a bit inconsistent, in some ways ... :-) But I very much like that by-task logstate, and the fact that it's inherited. Do you think you could propagate that to remote calls as well?

Maybe this whole thing could be generalized, actually - in a way, a logstate is a kind of resource that is propagated, right?

oschulz commented 4 years ago

What's important is that there must be a way to override the automatic inheritance during spawns and remote calls, so that a task can choose to propagate different values for some resources in certain cases - a subset of workers that the child task is allowed to use, less detailed logging at lower levels of the computation, etc. I think this kind of general approach could be very powerful, if available consistently.

c42f commented 4 years ago

there must be a way to override the automatic inheritance during spawns and remote calls, so that a task can choose to propagate different values for some resources in certain cases

I don't think this is directly related to @spawn, but an orthogonal feature which is part of the dynamically scoped context system (with_context in the PR). You may be using a third party algorithm which happens to use @spawn as an implementation detail, but that third party package can't know about the larger application's logging requirements.

oschulz commented 4 years ago

But wouldn't it be important that for code that is not aware of this, context is still propagated automatically, both locally and remotely?

tkf commented 4 years ago

context is still propagated automatically

That's how it works in #35833, at least locally. It may be possible to get semi-automated context propagation to the remote workers but we need to get multi-threading part right to get there. I've already designed the internal so that it is Distributed-friendly, even for multi-node scenario. So, it is already possible to manually bridge contexts across remote calls with automated propagation across threads.

However, as I've explained in https://github.com/JuliaLang/julia/issues/35757#issuecomment-626967900, "a set of available workers" is a very dynamic concept (each @spawn/remotecall changes the set). Also, the context does not propagate backwards from the child tasks to the parent task (this is a common behavior AFAICT for this kind of system). See the documentation and especially the Concurrent access section. So, it cannot be used as a communication mechanism for dynamic global resource. I think you need to handle this with a very different mechanism. Maybe a global state with a lock or some kind of coordinator task listening to channels. (I'm just mentioning this as, IIUC, that's your primary goal.)

tkf commented 4 years ago

it cannot be used as a communication mechanism for dynamic global resource

I said with_context is like dynamic let in the documentation. But I just noticed that that's actually not true because of this.

oschulz commented 4 years ago

However, as I've explained in #35757 (comment), "a set of available workers" is a very dynamic concept (each @spawn/remotecall changes the set)

In some situations for sure, where you have some central scheduling. But in other situations, one may want to split computations on a longer term to run complete algorithms (that do their own distribution too) on subsets of the available worker, with each of them handled by a "master" instance or so.

oschulz commented 4 years ago

Basically, it should be possible to partition the worker, with partitions that live longer than a single call. Each partition may cache a specific subset of input data, for example - still, partitions may not be fixed for the entire Julia session.

tkf commented 4 years ago

Ah, so you want to propagate worker pool not the set of idle workers. That's possible with #35833 but, unless schedule (hence @spawn) is aware of such worker pool, it's incomplete.

I think it's possible to allow dynamically scoped scheduler by something like

abstract type AbstractScheduler end
struct DefaultScheduler <: AbstractScheduler end

@contextvar SCHEDULER::AbstractScheduler = DefaultScheduler()

schedule(task) = schedule(SCHEDULER[], task)

schedule(::DefaultScheduler, task) = ...  # current implementation

so that you can do with_context(Base.SCHEDULER => CustomScheduler()) do ... end to enable your scheduler. You can even do remotecall in schedule(::CustomScheduler, task) if local pool is used up. That's the best point to propagate local context to remote.

I'm not sure customizable scheduler can be in Base since it adds one more dynamic dispatch per @spawn. But #35833 lets you build this as an external package.

oschulz commented 4 years ago

Oh, yes - sorry, I should have been more clear: My intent isn't distributed scheduling here (though we'll certainly want that too, of course), but distributed allocation/partitioning of resources like workers, data chunk (references), etc.

c42f commented 4 years ago

I had a nice chat to @JeffBezanson about this (summary in https://github.com/JuliaLang/julia/pull/35833#issuecomment-655872620). I think the most important part we discussed was efficiency. Whatever API we expose here, it should permit really efficient code generation for dynamically scoped variables in the future.

The following are my thoughts on how this could work:

The key to good code generation is to make sure the optimizer can reason locally about the value of dynamically scoped variables within a function and its inlined children. I think this implies that setting of context vars should be strictly nested with the call stack. (Roughly speaking, that with_context from #35833 is the only way to to set these variables.)

With strict nesting, any child function cannot affect the value of a context var in the parent. This gives the compiler a license to do some useful optimizations:

It can hoist the lookup machinery of a given context var to the outer scope of any function (or from an inlined function to the parent). This allows the use of context vars inside loops and inline-worthy functions without continually paying the lookup cost.
Context vars can be propagated without any lookup if the hoisting hits the construct which sets that variable (the construct being with_context in #35833).
In principle, the above two optimizations can supply precise type information at inlined use sites, leading to many further optimizations. If we want to attempt an inferrable effects system on top of context variables I feel like this is a required optimization. (Caveat: it's not clear to me how to model this optimization in the existing optimizer in a way which lets other passes see the type information. Context vars could be modeled as implicit function arguments, but I don't know whether this can be kept under control in terms of compile time in type inference, and the interaction with inline-worthy computation worries me a bit. Having said that, I really don't know the optimizer code at all, so a lot of this is mere speculation.)

It seems to me that little of the above can be done in the presence of non-strict nesting (except, perhaps for functions which call no other functions after children are inlined).

tkf commented 4 years ago

Thanks for the summary (here and in #35833)!

My immediate question is that "is it also reasonable to allow slow but easy-to-write code?" Why not let the performance of the function gradually degrade when it uses non-compiler friendly constructs? For example, I think building a concurrent-safe environment variable interface (where people can just do ENV[key] = value) would require something like contextvar[] = value interface. You probably don't care about the performance when spawning a process. So, using convenient syntax might be more valuable than optimizing the function. If you care about the performance, re-structuring code to use with_context wouldn't be so bad (it's kind of like using @inbounds). But, if you are writing (say) an HTTP service, I think it'd likely be too cumbersome as an interface.

It seems to me that little of the above can be done in the presence of non-strict nesting (except, perhaps for leaf functions after inlining).

By non-strict nesting you mean the "in-place" interface like contextvar[] = value (set!)? If everything in with_context is inlined, I suppose you can eliminate the loads and stores (maybe that's what you mean by "except, ..."')?

Also, I wonder if it's possible to accumulate the "patch" of the changes to the context variables and apply it when exiting the function (including exception)? Since the values of the context variables are not shared across tasks, I guess this is a valid optimization (in the sense that the difference is not observable)?

c42f commented 4 years ago

My immediate question is that "is it also reasonable to allow slow but easy-to-write code?" Why not let the performance of the function gradually degrade when it uses non-compiler friendly constructs?

The difficulty is that the compiler must be able to reason about code which is carefully written to be fast, and allowing slow code elsewhere might ruin some global properties which this reasoning might rely on. Consider:

function foo(x)
    with_context(a=>x) do
        bar()
        a[]  # What is the value of `a` here? Can the compiler prove it's equal to x?
    end
end

If bar() (or any function it calls) is allowed to set! a in a way that's observable in foo(), this makes optimization much more difficult. It's perhaps possible in the case that bar is fully inlined, but this would not be true for many common use cases. Logging is a particular example — by design that calls into a potentially deep tree of non-inlinable functions.

c42f commented 4 years ago

By non-strict nesting you mean the "in-place" interface like contextvar[] = value (set!)? If everything in with_context is inlined, I suppose you can eliminate the loads and stores (maybe that's what you mean by "except, ..."')?

Right, yes exactly this is what I meant. Essentially the nesting is about keeping side effects under control so that the compiler can apply local reasoning based just on the code within the function. Of course if everything is inlined the compiler can see all the code, but I think there's many practical cases where this won't be true.

BTW it's not just loads and stores, but any internal machinery required to look up the context var storage, eg hashing and/or traversal in the Dict/HAMT/whatever storage backend. (Unfortunately, stores can't be eliminated unless everything is inlined or the compiler can otherwise prove that the children don't access a given context var.)

tkf commented 4 years ago

If bar() (or any function it calls) is allowed to set! a in a way that's observable in foo(), this makes optimization much more difficult.

Oooh, I see. Thanks for the explanation!

If we want to attempt an inferrable effects system on top of context variables I feel like this is a required optimization.

BTW, can we implement a full effect handler on top of context variables? I wonder if fancier things like choose effect is possible? It'd imagine what we can do with the context variables would be limited to some kind of function dispatch system. Instead, if implementing effect system is a reasonable goal, maybe implementing the effect system first and then implement the context variable as the "state effect" handler? I have no idea if that'd be more compiler friendly, though.

(Incidentally, there is a discussion in Zulip on the performance and implementation strategies of effect handler. It's about using delimited continuation as the building block of effect handler. https://julialang.zulipchat.com/#narrow/stream/137791-general/topic/Performance.20and.20implementation.20strategies.20of.20algebraic.20effec)

oschulz commented 4 years ago

@c42f how would this play with propagating context to tasks and remote calls? My original motivation was for propagating information about available resources (workers, etc.) in a hierarchical computation. If context is propagated via stack (that should indeed be very fast), would we extract context from the stack and make it part of a closure or something like that?

c42f commented 4 years ago

BTW, can we implement a full effect handler on top of context variables?

Maybe not? I've had an uneasy feeling that there's some fundamental limitations of doing it on top of context vars. But there's a gaping hole in my understanding of the CPS transform and why people use it for this stuff, so I'm not sure what the limitations are :laughing: I need to read more here (thanks for the links!) but I'd also give up some generality for efficiency...

how would this play with propagating context to tasks and remote calls

My thought is that it conceptually "works the same everywhere". For efficiently spawning local tasks it should be effective to use a heap allocated persistent data structure (@tkf has mentioned the HAMT and I think he had a persistent variant in mind?). Then I think you can just copy a tiny fixed size handle to the current version of that data structure during task spawning. I'm not quite sure this all makes sense but it's how I'd want it to work :-)

For remote tasks we'd need to copy the context vars and serialize them, sending them to the remote side as part of invoking the remote function. With strict nesting we might get some useful optimizations here in limiting the amount of state we serialize and send?

JuliaLang / julia

Propagating context information to child-tasks and remote calls? #35757