chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 420 forks source link

Should we have inlined param procs? #18788

Open e-kayrakli opened 2 years ago

e-kayrakli commented 2 years ago

Case 1

Here's the particular use case that this came up:

// imagine this is an arkouda function that generates a message as string
proc div(x: int, y:int) {
  if y == 0 then
    return getErrorMsg("Divide by zero");
  else
    return (x/y):string; 
}

proc getErrorMsg(param msg: string) param {
  use Reflection;
  return getFilename()+":"+getLineNumber()+" "+msg;
}

This feels intuitive but doesn't work as expected today, because getErrorMsg will always return it's own file and line number. If we were to inline that function before resolving it, would that be helpful? getErrorMsg could than look like:

inline proc getErrorMsg(param msg:string) param { /* same body */ }

I think this feels nice and obvious.

Case 2

But somehow the below make me feel less confident about this:

inline proc getErrorMsg(param msg: string) param {
  use Reflection;
  return getFilename()+":"+getLineNumber()+" in function "+getRoutineName()+" "+msg;
}

For some reason, it feels more intuitive to me for getRoutineName to return getErrorMsg rather than its callsite. (And that would make this no different than where we are today)

Case 3

But doing a bit more introspection, I like the example below which relies on the same functionality:

module CheckFunctionality {
  inline proc doIHaveEverythingINeedForAnOptimization() param {
    use Reflection;
    return canResolve("foo") && canResolve("bar") && !_local;  // or more complicated
  }
}

module Main {
  use CheckFunctionality;

  proc doIt() {
    if doIHaveEverythingINeedForAnOptimization() {
      doTheFancyThing();
    }
    else {
      doTheSlowThing();
    }
  }
}

This is not much different than Case 2 in terms of its power. But this feels more intuitive for some reason that I can't explain.

In any case, inline param procs can be our way of doing preprocessor-like operations. And when I think about such things, I can never decide whether I like them, or I am scared of the power that they have. I use them in C, for sure, though :)


For implementation, I am not sure how this would work though. I think there are 3 things at play, and the order in which they happen:

I guess the earlier we inline, the more powerful the functionality is.

bradcray commented 2 years ago

At first blush, this seems attractive to me in that it (a) provides a need that I think we have and (b) seems like a reasonable composition of existing things. At this moment, I'm curious whether more time thinking about it will make me more or less enthusiastic about it.

I'm tagging @mhmerrill and @bmcdonald3 on this because I know that this is a pain-point in Arkouda that they've wrestled with (and in Mike's case, I think he was particularly annoyed by it during the initial implementation. And to be clear, it worried me a lot too in the "if I want to change this pattern, I need to touch 'n' pieces of code" sense, which seemed very problematic).

Your case 2 doesn't particularly bother me, FWIW.

I think the main thing that gives me pause at this point is uncertainty whether param is necessarily the right thing here, and what else one might or might not legally put into such a routine. I.e., normally I think of param as meaning "returns a param" which implies "is evaluated at compile-time", but these don't necessarily return anything. And what would the behavior be if code was inserted into them that was execution-time code? That's not a show-stopper, just a pause (and probably somewhat related to my general distaste for the keyword param :) ).

e-kayrakli commented 2 years ago

I.e., normally I think of param as meaning "returns a param" which implies "is evaluated at compile-time", but these don't necessarily return anything. And what would the behavior be if code was inserted into them that was execution-time code? That's not a show-stopper, just a pause (and probably somewhat related to my general distaste for the keyword param :) ).

I think they do return things, though. And further, I think we should enforce that. So, it wouldn't be like a copy/paste of code at the call site, but more like evaluating the function body and the return param at the call site.

bradcray commented 2 years ago

I think they do return things, though. And further, I think we should enforce that.

Oh you're right, I was reading too fast, and imagining my own versions of what this might look like. For example, would it be unreasonable to support:

inline proc genBradError(param msg: string) param {
  compilerError(getFilename() + ":" + getLineNumber() + " Brad's Error: " + msg);
}

I was going to argue that we could consider them to be returning param void in the case that they did not return anything.

So, it wouldn't be like a copy/paste of code at the call site, but more like evaluating the function body and the return param at the call site.

I was also thinking of it as "evaluate at the callsite", but the reason for a requirement to return isn't clear to me.

e-kayrakli commented 2 years ago

I was also thinking of it as "evaluate at the callsite", but the reason for a requirement to return isn't clear to me.

Yeah, I wasn't really considering compilerError/compilerWarning. You're right.

Just to be sure:

inline proc changeVariableAtCallSite() param: void{
  s = 10;
}

proc main() {
  var s = 5;
  changeVariableAtCallSite();
}

We are not considering allowing this, right? It feels too bizarre to me. But it is still a "feature" that are not too far off from the other examples in the OP. And if we think of the implementation of this feature as: inline->scopeResolve->resolve, that could imply that we would allow this.

bradcray commented 2 years ago

We are not considering allowing this, right?

I think that's right. I think we still require proper lexical scoping for the routine in its declared context, not the calling context, and that argument passing would be used to refer to things in the original scope if necessary.

What I'm less sure of is whether, if you passed s in by reference (say) and assigned it in this way, it would be modified by the routine, or whether the compiler would "optimize it away" since it is a param routine. That said, when I was musing on this earlier this morning, I realized that I don't think it's any different than the non-inlined param procedure case, so we can just do the same thing there (where I can never remember what we decided for cases like this... Doing a quick TIO check it looks like we do permit ourselves to optimize them away. I'm not sure whether we're committed to this, or whether it's just the state we happen to be in. It feels surprising to me, and in the inline param case, it seems like it would be more powerful / less surprising to not optimize such computations away (and probably in the non-inline case as well, for that matter...).

e-kayrakli commented 2 years ago

My knee-jerk reaction to that is both of them should be compilation errors (at the very least we should warn if we are optimizing them away). I am surprised that the code in TIO compiles. It is an execution-time operation done in a compile-time proc.

I disagree that it would be more powerful/meaningful for the inline param case. That portion of the proc that modifies a variable could very well be its own non-param helper proc. Lumping the param and non-param operations together feel confusing to me.

bradcray commented 2 years ago

at the very least we should warn if we are optimizing them away

I agree with that (especially since I can never remember until I write a test to check). I have a feeling that today it's more of a "just remove all param/type procedures" pass (or "replace calls to param/type calls with the evaluation of the call") rather than any kind of analysis and intentional removal of execution-time code within such routines. That's not to say we shouldn't do it, just that doing so would be work to determine whether such a procedure contained execution-time code.

That said, sometimes having execution-time code within a param/type procedure is useful, even if it isn't executed or doesn't need to be. For example:

proc widthOfSum(x, y) param : int {
  var z = x + y;
  return numBits(z.type);
}

so maybe the real condition should be "contains execution-time code that is side-effecting / observable, rather than just param/type routines that contain non-param/type code?

I disagree that it would be more powerful/meaningful for the inline param case. That portion of the proc that modifies a variable could very well be its own non-param helper proc.

I'm imagining a case where I want to encapsulate some chunk of code that I want to compute over and over again at execution time (so I put it into a procedure), but then want to refer to the original filename/line number position (so I need a param inline procedure for that to happen). Imagine a logging capability, say. It seems like it would be awkward / unfortunate to have to split this into two routines.

Lumping the param and non-param operations together feel confusing to me.

Tell me more: In what way?

e-kayrakli commented 2 years ago

I'm imagining a case where I want to encapsulate some chunk of code that I want to compute over and over again at execution time (so I put it into a procedure), but then want to refer to the original filename/line number position (so I need a param inline procedure for that to happen). Imagine a logging capability, say. It seems like it would be awkward / unfortunate to have to split this into two routines.

Is the following a viable solution to that to you?

inline proc genLogMsg() param {
  return "Op called at " + getLinenumber();
}

proc myExecTimeOp(ref x: int, param logMsg="") {
  x += 1;
}

proc main() {
  var a, b: int;
  myExecTimeOp(a, genLogMsg());
  myExecTimeOp(b, genLogMsg());
}

Maybe we can take a step further and rely on the interplay of inline param and default argument evaluation and just

proc myExecTimeOp(ref x: int, param logMsg=genLogMsg()) {
  x += 1;
}

proc main() {
  var a, b: int;
  myExecTimeOp(a);
  myExecTimeOp(b);
}

Tell me more: In what way?

I am trying to solidify the answer to that internally myself.

proc widthOfSum(x, y) param : int {
  var z = x + y;
  return numBits(z.type);
}

var z in a param proc feels not right to me. Even formals that are not compile-time feels wrong. Just thinking out loud, where I begin with for such a functionality would be:

proc widthOfSum(type t1, type t2) param : int {
  param x: t1;
  param y: t2;
  param z = x+y; 
  return numBits(z.type);
}

which is a pure compile-time function. Maybe I am still comfortable with

proc widthOfSum(x: ?t1, y: ?t2) param : int {
  param a: t1;
  param b: t2;
  param z = a+b; 
  return numBits(z.type);
}

and can view this as a convenient syntactic sugar as long as there is no statement in the proc body that reads x and y

I can see that this is more limiting, but enabling var z = x + y in a param proc feels like a slippery slope. Are we also going to allow var z = new unmanaged C();. Rhetorically, should that z be deleted? Can you do var z = myNonParamProc();?

I am guessing you are in favor of allowing all of these especially for hypothetical inline param functions?

Maybe I am clouded by the nuts-and-bolts of it, but seeing something that is clearly a dynamic operation (e.g., heap allocation) in a param function does not sit well with me at the moment.

bradcray commented 2 years ago

Is the following a viable solution to that to you?

Definitely viable, but seems inherently more involved and verbose, potentially unnecessarily so relative to just mixing compile-time and execution-time code in an inline param procedure.

Maybe we can take a step further and rely on the interplay of inline param and default argument evaluation and just

That one scares me a bit due to the reliance on whether argument defaults are evaluated by the caller or callee. We've changed this over the course of the project, and I don't feel very confident we'll never change it again (and it requires the user to understand the rule, which seems unfortunate... best would be for users not to even think about who's evaluating these). My intuition looking at the code would be that the place where the default argument is specified is where the line number and filename would be printed from.

I am trying to solidify the answer to that internally myself.

I think I can describe your discomfort based on your examples: It's the interpretation that putting param on a procedure means "is completely evaluated at compile-time and has no execution-time role" rather than "returns a param" (so is at least partially evaluated at compile-time to get the param value that's returned). Today, I think we work under the simplifying assumption that a param routine isn't side-effecting / has no execution-time role. But once we start talking about inlining them, relaxing that seems attractive to me (and the current behavior has often led to surprises... "why is my writeln() not called"... though the warning you propose could also help with that).

Even formals that are not compile-time feels wrong.

Oh, I love this feature (independent of whether we permit side-effects in param procedures), so we may have to arm wrestle on this one. :)

Specifically, while your workarounds are viable, I find them way more annoying to write in terms of verbosity relative to mine. And they also raise questions for me like "are the param equivalents of given routines always going to be equivalent to the execution-time versions?" E.g., what if rather than calling + I was calling a user routine foo(x,y) that had no param version and wanted to reason about what it returned? Even with +, my head starts getting into things about how we support different coercions for params based on the value they store than we do for variables when we can't know its value. E.g., param x: int; is 0 which can be passed to an int(8) argument, whereas param x: int = max(int); can't be because it's too big and var x: int; can't be because we don't know. So I'm concerned that the param/type rewrite of an arbitrary expression as you do here won't be possible in all cases, and may not give the same answer in others.

e-kayrakli commented 2 years ago

All this here makes sense to me.

Ruminating a bit more about this, here's a thought experiment. Imagine you want to modify a global param (which you can't do today). Even though I know that we can't do that, my reflex would be to write the following today:

param x = 5;

proc foo() param {
  x = 6;
}

foo();

compilerWarning(x:string);

Even though it feels intuitive to me, it doesn't make any sense if I stop and think for 2 seconds. param in foo's header is just a return intent, but the function doesn't even return anything.

So maybe, my mindset can be represented with the following hypothetical code:

param x = 5;

param proc foo() {
  x = 6;
}

foo();

compilerWarning(x:string);

This may be attractive for other reasons. But that's a separate idea than what we are discussing here. (Maybe inline param proc foo() {} is exactly what a preprocessor macro is?)


I think I am convinced and on the same page as you w.r.t. the main proposal here. I will need to spend some conscious effort to rewire my reflexes here. But I've overcome worse reflexes in the past :)

mppf commented 2 years ago

With the caveat that I have only read the OP, I wanted to mention that we do some things with tracking line numbers in inline functions differently already. I'm not sure if that makes sense in the long term. My own reaction to these things is that we want compilerWarning / compilerError / getLineNumber / etc to be able to identify the call site from a different module which I think I would rather have than special behavior for inline.

e-kayrakli commented 2 years ago

My own reaction to these things is that we want compilerWarning / compilerError / getLineNumber / etc to be able to identify the call site from a different module which I think I would rather have than special behavior for inline.

How can a user write a function like that themselves? Do you imagine some sort of an annotation support?

mppf commented 2 years ago

How can a user write a function like that themselves? Do you imagine some sort of an annotation support?

I'd expect that we can provide the building blocks that are necessary (such as, effectively, "get the call site location from not-this-module" and then getting line numbers etc there -- arguably compilerWarning / compilerError could be written with those idioms). I don't think that the user will be able to write it all themselves - we have to add something for it to work.

bradcray commented 2 years ago

but the function doesn't even return anything.

Well, we could interpret it as returning param : void (which is how I was rationalizing my "returns nothing" version of your original error routine that just calls compilerError()).

inline param proc

That occurred to me as well, though my negative reactions are: (a) it's yet another prefix keyword (others being extern, export, inline (obviously), and config for non-procs) which leads to questions like "what order should/can they be used in?" / "which can be used together?" / etc. (b) my standard complaint that param doesn't inherently suggest "at compile time" for me any more than using inline type proc would (since types also must generally be known at compile-time). Put another way, given that I'm already not super-excited about param as our way of saying "constant whose value is known at compile-time", I'm similarly unenthused about extending it to mean "a general declaration modifier that means something is evaluated at compile-time." (e.g., wouldn't that suggest that maybe we should also be declaring param const and param vars?)

I concede that interpreting a non-returning param routine as "returns param : void" is a bit more obscured / subtle / reliant on something that's a big PL-geeky, but it also has the benefit of being more orthogonal with other param routines and not requiring a new syntactic form. And, arguably, we support "returns param : void" today and have for awhile: https://tio.run/##HcsxDoAgDAXQvaf4cYKJXWYHj0EQhQRo07gZz14T3/5yTVK6mShnnMzOQ5KmgYeAEMC6IvOQ1otuqqxu2XEwJt@1zWvxkV6iP0azDw

bradcray commented 2 years ago

w.r.t. Michael's thread (oh how I wish GitHub issues had hierarchical threads): I agree that it would be nice to have the ability to refer to different depths of the callstack, or even to be able to walk and inspect the callstack, from code, but I'm not sure that it would make my desire for an equivalent to cpp's #define foo(x) ... that could refer to __LINE__ and __FILE__ evaporate. Specifically, Engin's expression of getErrorMessage() in the OP seems fairly clean and nice, with the main downside being the learning curve of why inline + param has this subtle behavior (also noted in my previous comment). Saying getFunctionName(frame=-1) (say) feels clunkier to me. And at times when we've had capabilities in the modules that have taken an optional depth argument as a means of referring to outer stack frames, they haven't held up so well in the face of composition, where the depth to an interesting stack frame differs from one callchain to the next.

Related to that, I worry about the tension between static and dynamic knowledge. I.e., __FILE__ and __LINE are fully static, as are getFilename(), getLineno() today, and the solution in the OP relies heavily on this static knowledge. But if we started having code that could refer to an arbitrary number of stack frames up from the current routine, different callchains could require different depths to reach a point of interest, and not be able to know those depths statically. So are we turning cases that only need easy static information into dynamic code unnecessarily?

Again, I'm not saying we shouldn't have such a feature, and I think it would be powerful and interesting, particularly for building tools and libraries supporting them. I'm just not sure it would be fully satisfying to me for cases where I just want the equivalent of a C macro function. But I also admit that I may not be imagining Michael's approach in the same way he is.

mppf commented 2 years ago

And at times when we've had capabilities in the modules that have taken an optional depth argument as a means of referring to outer stack frames, they haven't held up so well in the face of composition, where the depth to an interesting stack frame differs from one callchain to the next.

I agree - which is why I've been proposing that the "depth" idea is in terms of modules rather than functions.

But if we started having code that could refer to an arbitrary number of stack frames up from the current routine, different callchains could require different depths to reach a point of interest, and not be able to know those depths statically. So are we turning cases that only need easy static information into dynamic code unnecessarily?

AFAIK for the interesting use cases we only care about one of the call stacks that was used during resolution. (E.g. for compilerError, we want to issue the error at at least one of the calls). I could be wrong about that though. But anyway I wasn't imagining any dynamic information to implement these features (just as we don't have dynamic information for the clunky errorDepth argument for compilerError today). Certainly, if we only had resolution-call-stack logic, it would work just fine for a use case where you could apply an inlined param proc, wouldn't it?

e-kayrakli commented 2 years ago

inline param proc

That occurred to me as well, though my negative reactions are: (a) it's yet another prefix keyword (others being extern, export, inline (obviously), and config for non-procs) which leads to questions like "what order should/can they be used in?" / "which can be used together?" / etc.

Yes, I can see this point. But if we know we want the functionality, I can very comfortably accept this as a something that I need to learn.

(b) my standard complaint that param doesn't inherently suggest "at compile time" for me any more than using inline type proc would (since types also must generally be known at compile-time). Put another way, given that I'm already not super-excited about param as our way of saying "constant whose value is known at compile-time", I'm similarly unenthused about extending it to mean "a general declaration modifier that means something is evaluated at compile-time." (e.g., wouldn't that suggest that maybe we should also be declaring param const and param vars?)

First part of this statement doesn't bother me either. I don't have a big problem with param. If anything, I can use your argument to argue against the alternative value: inline value proc doesn't make much sense because proc is not a value.

I agree with the second part. And to me, it is probably the least ideal part of inline param proc syntax. Nonetheless, it is only inconsistent -- I don't think it can lead to any confusion that's hard to resolve.

bradcray commented 2 years ago

inline value proc doesn't make much sense because proc is not a value.

But I'm not advocating for inline value proc as the way of writing this... :D (i.e., if I successfully got us to change param to value, I'd put it back in the return intent position syntactically to say "returns a value known at compile-time" (where that value could be of the degenerate void type).