Open design idea: implicit scoped function parameters

chandlerc commented 2 years ago

Disclaimer

This issue is part of a series that are just recording language design ideas that have come up for Carbon. It isn't necessarily a good idea, or one that Carbon should definitely adopt. However, it is an interesting area that has come up several times and seems to at least be promising. But it still might not work out!!

Before picking up a language design idea like this and fully developing it, we encourage you to find some folks who are very active in Carbon's language design (as well as potentially one of the leads) and discuss the area with them to get a feel for what would make sense, challenges they anticipate, etc.

Implicit scoped function parameters

There are a number of interesting cases where functions would benefit from accepting parameters that do not need to be directly written in the call argument list, and instead can be implicitly provided by some surrounding context. The canonical example of this for me is the Context Object pattern, or as described by the ACCU the Encapsulate Context Pattern. These can be found in many places, one that I have personally worked extensively with is the LLVMContext.

I would like to suggest some scope-based implicit parameter system to make this pattern significantly more ergonomic.

As very provisional syntax, you could imagine this working similarly to the implicit object parameter of methods:

fn MyFunction[context: MyContext*](x: i32, y: i32) {
  var f: auto = context->GetFlag(x);
  // ...
}

If this uses some scope-based system, these would be trivially propagated:

fn MyOtherFunction[context: MyContext*](...) {
  // ...

  // No need to pass `context`, it's already in scope.
  MyFunction(1, 2);

  // ...
}

Beyond context objects

If we have a system like this, it could be used for many things beyond just context objects. Below are just ideas that I suspect should be explored to make sure any changes or nuances to the design are reflected. They aren't necessarily deeply thought out or refined (yet).

Dependency injection

When allowing injection of dependencies, implicit scoped parameters may provide useful ergonomic affordances. This is especially true by combining these with default arguments.

Evolution of global variables

Carbon doesn't yet have a full design for global variables, and they are likely to be reasonable controversial. We will face a tough choice here as they add significant complexity to the language, especially when factoring in dynamic initialization. However, C++ has them and codebases are expected to heavily use them.

Whatever design Carbon has around global variables, one thing that would be useful is to be able to evolve away from them in contexts where desirable. Scoped parameter may provide a powerful tool as an imported global variable is in scope for all calls in a file. So code that currently uses a global variable could potentially use the following evolution path:

Switch to an implicit scoped parameter with a default argument of the global.
Update callers to import the global into their scopes.
Remove the default argument and the direct use of the global.

This can be iterated as desired. It has the useful property of minimizing the scope of any atomic change required and allowing incremental evolution of the code.

Using implicit scoped parameters for this also dovetails with dependency injection above when desired for globals.

Making resource access explicit

Another interesting use case is making resource access explicit and part of function signatures in an ergonomic fashion. As an example, it would be nice for functions to be able to declare that they don't heap allocate. One way of doing this might be to model the heap as a global variable. Code can then not import that global, but accept a heap implicit parameter in functions which actually need to allocate. This would make it clear that the others don't heap allocate. We could go further and allow functions to opt-in to making no globals available, and even surfacing this as part of their signature so that callers can rely on the implicit scoped parameters to precisely model resource access or other properties.

Open questions

Maybe its obvious, but this is far from an exhaustive list. =] But trying to capture some likely open questions that will need to be addressed here:

How are in-scope arguments identified?
- By type?
- Most natural way to fit it into the "signature" model of the function's API.
- What about when two different types are needed? Maybe adapters are powerful enough.
- Somewhat surprising that for these types (and no other types) it is possible to find the innermost declared value of that type without looking at its name. But maybe this is OK?
- How do we handle when we want to pass these by-pointer? Maybe that's just OK and part of the type?
- By name?
- Most natural way to associate a "scope" with the argument in some ways.
- Interacts (very) poorly with disallowed shadowing, so much so that we might have to change the shadowing rules.
- Makes parameter names a very significant part of the signature, which is surprising and not true generally.
- By something else?
- Avoids the problems with both types and names.
- Requires inventing a third way to build the association.
Should all types be available to work this way?
- If not, how do we identify which types?
- If so, does this create too many ways for a function to silently start depending on its caller's scope?

h3har commented 2 years ago

Perhaps this idea of implicit context parameters could be useful for functions accepting source location information? After all, the current source location is a form of context. Another option is doing that via default parameters. C++20 does this like:

void log(const std::string_view message, 
         const std::source_location location = std::source_location::current());

where std::source_location::current() has a magical compiler implementation. I'd expect some sort of "implicitly accepts source location information" feature is desirable for Carbon too.

And would these context parameters be "implicit only" though with no way to pass them explicitly? The default parameter approach for std::source_location has the benefit that you can pass it explicitly if you need to (although I can't think of a scenario where you'd want to in this specific example).

chandlerc commented 2 years ago

Perhaps this idea of implicit context parameters could be useful for functions accepting source location information? After all, the current source location is a form of context. Another option is doing that via default parameters. C++20 does this like:
void log(const std::string_view message, 
         const std::source_location location = std::source_location::current());
where std::source_location::current() has a magical compiler implementation. I'd expect some sort of "implicitly accepts source location information" feature is desirable for Carbon too.

Definitely desirable, but I would personally expect it to be orthogonal.

And would these context parameters be "implicit only" though with no way to pass them explicitly? The default parameter approach for std::source_location has the benefit that you can pass it explicitly if you need to (although I can't think of a scenario where you'd want to in this specific example).

This design idea is specifically for parameters that are only scoped-based and never provided with the call syntax. Default arguments to fill them in when absent I would hope to be orthogonal.

FWIW, explicitly passed source locations can be useful in generated code and a few other places.

josh11b commented 2 years ago

+1

I personally think this feature would be very useful for things like logging and memory allocation. A generic implicit scoped function parameter or an implicit scoped function parameter with a generic type would potentially be a way to add this customization at compile-time without runtime overhead.

JamesJCode commented 2 years ago

Worth noting that Kotlin has a similar concept, which when added with other language features makes for some very expressive coding opportunities: https://kotlinlang.org/docs/lambdas.html#function-literals-with-receiver

It also implements a default binding to this receiving / context object for otherwise-unqualified method calls (i.e. the context object becomes an implicit this in name resolution). This reduces syntax noise, but perhaps hides a layer of clarity in 'what is being called on what'.

Pixep commented 2 years ago

Definitely useful, and lacking in C++. Adding a different example for React: React uses a ContextProvider in a hierarchy tree to expose a context implicitly, and useContext (for functional components) to explicitly "grab" one of the context (https://reactjs.org/docs/context.html). So context explicitly provided for a specific scope, and invisible in components except if you need to use it (with context = useContext(<type>), which limits the number of arguments/attributes that have to be explicitly carried around. Food for thoughts more than anything, as this doesn't exactly map to Carbon's current design

OlaFosheimGrostad commented 2 years ago

I think this could be useful for interpolated strings, it could potentially be used for:

localization context
allocators
stringwriters
buffers

But I don't know what the overall usability would be if people start to abuse it…

c-cube commented 2 years ago

Relevant sota (with allocators): https://odin-lang.org/docs/overview/#implicit-context-system

BoyeGuillaume commented 2 years ago

It would be a nice improvement to C++. I am unsure whether it can be used for the source location exemple (as the source location changes within the local scope of the caller)

I still believe that such a feature could be really interesting to add. It has been implemented in high level functional programming language such as Scala which offers a similar feature (called implicits).

timjroberts commented 2 years ago

I really like this. As you mention, this ambient context pattern can be useful for lots of use-cases and having tighter language support for it sounds like a great idea to me.

geoffromer commented 2 years ago

Another open question that I think we should consider: are these parameters visible in the function signature? In other words, should we even think of them as parameters at all?

I know of a few reasons to think that the right answer might be "no":

It seems ergonomically unacceptable to say that every function that heap-allocates (directly or through any of its callees) must declare that fact in its signature. At most, we could maybe say that every function implicitly declares the allocator as an implicit scoped function parameter, unless it opts out. However, that opt-out mechanism sounds liable to be awkward and confusing (how do you declare that a function doesn't take a certain parameter?), and it's not clear how we'd decide which parameters get to be doubly implicit in this way. So it seems like resources like the heap would still have to be globals, or something close to it.
A lot of different kinds of resources want to be implicitly propagated in this way -- just off the top of my head, there's the heap, logging and process I/O, the filesystem, cancellation tokens, executors, the current task, and probably a lot more. If each of those has to be declared as an explicit parameter of every function that uses them (directly or transitively), writing function signatures is going to be miserable. To some extent that can be mitigated by grouping related resources into larger aggregates, but it's going to be very difficult to choose how to aggregate them in a way that satisfies everyone, and it comes at a cost: if I want to replace one component of an aggregate but keep the others, I need to make a copy of the whole aggregate and then mutate the member I want to change. Even if that's acceptable from a performance standpoint, it's liable to be ergonomically awkward, and it will probably be very difficult to prevent people from accidentally (or intentionally) mutating the aggregate in-place, which would persist past the end of the scope containing the mutation.
If these things are visible in the function signature, interfaces and implementations have to agree on which ones a given method can use, as do base and derived classes. If I want my method body to do some debug logging, I have to plumb the logging resource into the interface I'm implementing, or else make sure the logging resource is a global variable. In a lot of cases, the former isn't going to be an option (I can't update Carbon.EqWith.Eq to take my weird logging framework as a parameter, even implicitly), so this approach will probably not be very effective as a way of getting rid of global variables.

The alternative I have in mind is to provide some mechanism for accessing the "ambient instance" of a given type, and a corresponding mechanism for setting the ambient instance of a type (I say "type" for simplicity, but if we want to key by name I think there are ways of doing that too). Crucially, these operations are scoped, not global: setting the ambient instance has no effect on other threads, or on any code below you on the stack. I think that might provide many of the benefits of global variables (and the Singleton pattern), and many of the benefits of implicit scoped function parameters, while avoiding many of the drawbacks.

timjroberts commented 2 years ago

While you may have many 'things' in scope, a function would only have to declare an explicit parameter for the things that it requires access to. If a function is not interested in responding to cancellation for example, then it wouldn't need to define an explicit parameter for it, but the scope for cancellation could still be present.

This ambient instance approach is common in .NET. Usually a TScope provides access to a TContext that represents the instance through a current property:

using (new TScope()) {
  ...
  TContext.current
  ...
  foo();
}

fn foo() {
  TContext.current
}

foo(); // No TScope present for this invocation

Importantly though, the foo function implementation has to be defensive, since it may be invoked outside of any TScope and as such the TContext.current property may be null. Perhaps Carbon could support both. The ambient instance accessor (like above) could be used where your function implementation can be defensive (and you're essentially open to being invoked without that scope), but if you declare an explicit parameter, then the compiler could ensure that the scope is present.

geoffromer commented 2 years ago

While you may have many 'things' in scope, a function would only have to declare an explicit parameter for the things that it requires access to. If a function is not interested in responding to cancellation for example, then it wouldn't need to define an explicit parameter for it, but the scope for cancellation could still be present.

Sure, but my suspicion is that typical functions will want to access (or at least reserve the right to access) several ambient resources. The heap and the debug log are clearer examples here than cancellation tokens, but even that will probably be pretty pervasive in some contexts.

timjroberts commented 2 years ago

Sure, but my suspicion is that typical functions will want to access (or at least reserve the right to access) several ambient resources [...]

So this proliferation of having to access many resources would be present because we might consider using this ambient context as a way of managing things like globals, for example? Forgive me if I am going over old ground.

While this issue appears focused on the function parameters themselves, I wonder if any thought has been given to how one might put something in scope? Perhaps that would help inform some choices on the function (consuming) side? I'll have a look for related issues and on Discord. I think there are a few linked items at the top of this issue too.

chandlerc commented 1 year ago

Another open question that I think we should consider: are these parameters visible in the function signature? In other words, should we even think of them as parameters at all?

I know of a few reasons to think that the right answer might be "no":

[snip, but largely agree]

The alternative I have in mind is to provide some mechanism for accessing the "ambient instance" of a given type, and a corresponding mechanism for setting the ambient instance of a type

Something like this was maybe an implicit (sorry if so) assumption of the whole design for me, FWIW: specifically these "ambient instances" in my mind are global variables, and the scoping rules for them already (I think) give the desired fallback structure.

So my hope was that to access the heap (or another resources) via an implicit parameter, it would have to be in the signature. But a function could always just import and use a global variable instead, keeping this out of the signature.

And globals, due to their scope, would also trivially satisfy an implicit argument when calling a function that takes an implicit parameter.

Crucially, these operations are scoped, not global: setting the ambient instance has no effect on other threads, or on any code below you on the stack. I think that might provide many of the benefits of global variables (and the Singleton pattern), and many of the benefits of implicit scoped function parameters, while avoiding many of the drawbacks.

Without these being in the signature, we need some way for the caller and callee to agree on where they can be discovered. I think the options here are a global or a thread-local. I'm not sure I see any other interesting implementation strategies? Specifically, I'm not sure how to improve on what can already be done with a thread-local to emulate what you describe in the face of separate compilation.

As a consequence, I had expected the "ambient instance" to in essence be either a global or perhaps a thread-local.

For many cases though, I feel like these are likely to be stateless and thus have no real benefit from being thread-local vs. global.

chandlerc commented 1 year ago

Sure, but my suspicion is that typical functions will want to access (or at least reserve the right to access) several ambient resources [...]

So this proliferation of having to access many resources would be present because we might consider using this ambient context as a way of managing things like globals, for example?

Yes, and for me at least, a mechanism to shift APIs that start off using globals to become parameterized in as incremental a fashion as possible.

While this issue appears focused on the function parameters themselves, I wonder if any thought has been given to how one might put something in scope? Perhaps that would help inform some choices on the function (consuming) side? I'll have a look for related issues and on Discord. I think there are a few linked items at the top of this issue too.

See my reply above -- my thinking thus far was pretty centered around using the "global" (its actually package for Carbon, but that is about naming not lifetime so somewhat irrelevant) scope to bootstrap here.

geoffromer commented 1 year ago

@chandlerc Yeah, I'm coming around to thinking of these as variables rather than keying off of the type. But what I have in mind is importantly different from global and thread-local variables, because neither of those natively gives you a way to set a new value in a way that's only visible within your scope, and not visible in outer scopes. So yes, it would probably be implemented in terms of a thread-local variable, but it wouldn't be equivalent to one. The sort of strategy I had in mind was to implement it as a thread-local pointer to the current ambient instance, with the generated code for the "set" operation effectively creating a local RAII object which will reset the pointer to its old value at the end of the scope.

The biggest problem I see is how this would work with things like lambda capture and coroutine suspend/resume. I haven't entirely thought through how we'd want those to work even in principle, but some of the plausible answers seem like they aren't feasible with that implementation strategy, and maybe with any implementation strategy that doesn't have help from the function signature.

philippeb8 commented 1 year ago

Adding implicit function parameters is essential for efficient memory management and stack traces, even for release builds. Same thing with class members and static scope variable instances, as I already tested it.

But I'll have to scrutinize the license first and reopen the subject.

carbon-language / carbon-lang