Open SamirDroubi opened 10 months ago
@yamaguchi1024 @skeqiqevian. Can you add any other propsals you had in mind?
Some thoughts:
I think there are a few semi-orthogonal sub-issues here that would be worth breaking out when thinking about the design. In particular, I am thinking of (a) what is the return value (just proc, proc + dataclass, proc + one cursor); (b) how should the "return cursors" functionality be exposed? (e.g. should this be an alternate version of a primitive or should the expected signature of a scheduling operation primitive change?); (c) slightly different is "how should this be implemented?" (e.g. purely in user-library vs. changing the primitives' implementation?)
Another option (for dimension (b)) is to support a flag (e.g. p, cs = sched_op(p, c, ..., return_cursors=True)
) that changes the returned value. If you think of this functionality as "advanced" then a little bit of extra typing might be appropriate.
I like that the dataclass idea can be combined with lvalue
pattern matching to unpack the different cursors "positionally" e.g. p, (alloc, load, store) = stage_mem(...)
or unpack the cursors by name as you suggested @SamirDroubi. In the event that a scheduling op naturally wants to return only one or zero cursors, what would we do? Still return a dataclass for uniformity of the interface? (That doesn't seem like a terrible idea...?)
The "return one cursor" variant is probably important for a combinator user library along the lines of the one we sketched in our PLDI submission. However, I don't think that option is either maximally simple (no cursor returned) or maximally useful/informative (the multiple cursor return). One could imagine e.g. a new combinator that allows one to wrap an op and specify which of the dataclass entries should be returned as "the" cursor. Bottom line is we can implement better "one cursor" convention libraries on top of the other options.
The second foo
code (post stage_mem
) has a bunch of arrows pointing to parts of it. This reminds me of the "pattern matching multiple cursor groups" idea we floated as a more self-documenting introspection mechanism. Suppose we added some such feature. We would have to return a set of cursors in that case; one for each group. Shouldn't that ultimately use the same convention that we decide on here? If so, maybe thinking about that feature could help us feel more confident about a decision here (e.g. exactly what something like the dataclass idea should look like)
p, _ = sched(...)
. On the other hand, having to specify return_cursros=True
is more verbose (and more characters to write). I guess ultimately it would depend on how frequently you would need to specify that argument. I would say the nice thing about having this option is someone who doesn't want to think about cursors, can completely operate as if they didn't exist.__iter__
. I am on the fence of the case where there is exactly one cursor because just returning the cursor wouldn't change the number of returned values, just the type of the second one. When there is no cursors to return, I think it would be nice to have uniformity at least in terms of number of returned arguments. Although it is a bit weird that every use of some op is of the form: p, _ = sched(...)
.Continuing the conversation where we're not converged:
p,_ = op(p, ...)
isn't unreasonable. I think the argument from the other side is that it's conceptually another spinning wheel. There was an old proposal which would simplify this to op(...)
at the cost of not having as much explicit control. Absent other considerations, we should try to avoid the situation where there are a bunch of extra parameters/return-values that always have to be handled because such designs are more complicated to learn and tend to produce more syntactic boilerplate. If the return_cursors
is an "advanced" feature, you can imagine it being as simple as adding op(..., rc=1)
(using "truthiness" instead of True
) or something similarly brief.dataclass
to autogenerate the __iter__
implementation.rc=1
style is that if it returns 0
cursors, then p,_ = op(..., rc=1)
makes sense even if it is unnecessary. I think the awkward case is p,(c,) = op(..., rc=1)
but maybe that's OK given that this is invoking the advanced feature?p = op(p, ...)
; Basic at location is p = op(p, c, ...)
; Combinator is p,c = op(p,c, ...)
; Full Navigation is p,(c1,...) = op(p,c, ...)
An additional thought (not to roll into this proposal but thinking further ahead) is that it would be great if there was a "standard" way to decorate some user-defined scheduling ops in order to encourage adhering to a convention and maybe generate some boilerplate functionality as a consequence. There could be different conventions (as suggested above). You could imagine such a decorator allowing us to install different kinds of cursor coercion functionality (already exists for primitives, but not user ops), maybe help with documentation, and maybe package rc=1
optional functionality etc. You could also imagine auto-coercion between some different operator conventions.
@sched_op
decorator so that it automatically adds the new default argument rc=False
.@returned_cursors("alloc", "load", "block", "store")
which then would serve as a docuemntation. If we go with the two decorators apporach, then the sched_op
decorator should likely enforce the returned_cursors
decorator.NewExprA
) where the return value is a non-API object. One option is to not export any thing that could return a non-API object for now and make the user-level argument signature be a string instead which is what you would do now anyways.Re auto-coercion, note that the Argument Processors do this. That is one concept of coercion. Another concept concerns totally standard liftings between different SchedOp signatures. eg an op with signature (p,c,…) -> p can be lifted to an op with signature (p,c,…) -> (p,c) implicitly (a coercion) whenever supplied as input to repeat
or a similar combinator.The argument processors are currently a bit tricky to disentangle from the compiler implementation into a user-level concept, which is why I’m suggesting turning that (and more general coercions, op decorators, etc) into one or more future proposals.— GilbertOn Dec 26, 2023, at 5:36 PM, Samir Droubi @.***> wrote:
Yeah, I agree with you that there should be a supported scheduling mode that allows that user to ignore anything cursors related. I think the consise arguemnt you described makes sense, it is not clear what it does from looking at it, but it is in every operation and will be common in code that users will just learn. Agreed. I agree it should be okay to expect the user to unpack the dataclass in that one off case. Agreed. I am not sure exactly what you mean by auto-coersion, it would be great if you could elaborate more on this. I was generally thinking that we would extend the @sched_op decorator so that it automatically adds the new default argument rc=1. We can potentially make the dataclass be automatically generated and associated with a scheduling op via the same decorator or a different one: @returned_cursors("alloc", "load", "block", "store") which then would serve as a docuemntation. If we go with the two decorators apporach, then the sched_op decorator should likely enforce the returned_cursors decorator. How existing and new decorators can be exported to the user world is really important discussion as well. The current decorator and argument processors can greatly simplify the user ops implementation and provide easy documentation. I think simply exporting them as is may not be a terrible idea since the almost all return API cursors. There are some exceptions (NewExprA) where the return value is a non-API object. One option is to not export any thing that could return a non-API object for now and make the user-level argument signature be a string instead which is what you would do now anyways. Another remaining point of the discussion is (1.c). How should this be implemented?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
To summarize, the design we arrived at above:
a. Relevant cursors will be returned as a dataclass.
b. Using this functionality will require opting-in by specifcying a default parameter rc=True
(return cursors) which will be the last parameter of a scheduling op.
c. This functionlity will be added to the primitives themsevles.
d. Cases where the are 0 or 1 relevant cursors will still return a dataclass when the the functionality is inovked for consisntency.
e. The per-op dataclass will be automatically defined by a decorator or explicitly.
f. The default argument will either be enforced or automatically added by a decorator.
I think (e) and (f) present a new design decision: Option 1:
**kwargs
rc
with boolean values.@returned_cursors("alloc", "load", "block", "store")
). This decorator binds the type of the dataclass in **kwargs
Option 2:
rc
similarly to how we enforce the first argument to be a procedure.@returned_cursors("alloc", "load", "block", "store")
to the function that binds the dynamically-created definition of the dataclass to the globalscope under some naming convention.I think Option 2 with (2.b) is the cleanest and most explicit. The rc
argument will be explicit in the operation defintion. It exposes the dataclass definition to the users, so they can introspect it with help()
. The information about what cursors are returned is clearly coupled with the scheduling operation via the decorator and we can add them to the information displayed when a user calls help()
on a scheduling operation.
Can we just chop off e & f from this proposal and fold those discussions into a subsequent proposal? (focused on how to better support user-defined scheduling operations using decorators)
On Dec 28, 2023, at 9:18 AM, Samir Droubi @.***> wrote:
To summarize, the design we arrived at above: a. Relevant cursors will be returned as a dataclass. b. Using this functionality will require opting-in by specifcying a default parameter rc=True (return cursors) which will be the last parameter of a scheduling op. c. This functionlity will be added to the primitives themsevles. d. Cases where the are 0 or 1 relevant cursors will still return a dataclass when the the functionality is inovked for consisntency. e. The per-op dataclass will be automatically defined by a decorator or explicitly. f. The default argument will either be enforced or automatically added by a decorator.
I think (e) and (f) present a new design decision: Option 1:
Require each scheduling operation to have an extra final argument kwargs We modify the argument processor to enforce what goes in that. Currently, we only allow rc with boolean values. We can add a decorator that dynamically defines the dataclass e.g. **@.*_cursors("alloc", "load", "block", "store")). This decorator binds the type of the dataclass in kwargs Option 2:
Require each operation to have the last argument be rc similarly to how we enforce the first argument to be a procedure. To define the dataclass: a) Statically (explicitly) define a dataclass in the globalscope with some naming convention e.g. "{op_name}_rc" b) Use a decorator @returned_cursors("alloc", "load", "block", "store") to the function that binds the dynamically-created definition of the dataclass to the globalscope under some naming convention. I think Option 2 with (2.b) is the cleanest and most explicit. The rc argument will be explicit in the operation defintion. It exposes the dataclass definition to the users, so they can introspect it with help(). The information about what cursors are returned is clearly coupled with the scheduling operation via the decorator and we can add them to the information displayed when a user calls help() on a scheduling operation.
— Reply to this email directly, view it on GitHub https://github.com/exo-lang/exo/issues/541#issuecomment-1871358471, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKZHLBUJTZ2SD2DYSFBC6TYLWSYHAVCNFSM6AAAAABA2NWFC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZRGM2TQNBXGE. You are receiving this because you commented.
(e) and (f) were referring to the primitives here. I think we need to have something if we are to implement this feature for the primitives. The other alternative is to hold off on implementing this feature until we have the second discussion about how decorators are exported by the API.
Primitives are going to have to modify their implementation (in the API_Scheduling.py
file) to specify what the cursor values are.
Note that all primitive scheduling operations are instances of the class AtomicSchedulingOp
, so that (and the @.***_opdecorator) provides an existing centralized location to change the behavior of all primitives. The
rckwarg can be intercepted in
AtomicSchedulingOp.call. Presumably, we would then modify the functions later in that file to return a data class of cursors by default and optionally suppress that return value in
AtomicSchedulingOp.call`.
Then, we need to figure out how to specify the dataclasses in question. In general, these dataclasses could be defined above the corresponding scheduling op. They then need to be user-exposed in stdlib/scheduling.py
alongside the corresponding scheduling op. Another option could be to abbreviate these dataclass definitions as (e.g.) Op1Cursors = MakeCursorSet(‘Op1Cursors’, ‘alloc’, ‘load’, ’store’)
. This will work if we don’t want to do anything clever with the type annotations for the cursors (i.e. they can all just be generic cursors).
I think that accounts for almost all the issues. Have a look at the code and see if what I’m saying makes sense.
On Dec 28, 2023, at 1:59 PM, Samir Droubi @.***> wrote:
(e) and (f) were referring to the primitives here. I think we need to have something if we are to implement this feature for the primitives. The other alternative is to hold off on implementing this feature until we have the second discussion about how decorators are exported by the API.
— Reply to this email directly, view it on GitHub https://github.com/exo-lang/exo/issues/541#issuecomment-1871515010, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKZHLCXL56RYXVQIMOC66TYLXTTXAVCNFSM6AAAAABA2NWFC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZRGUYTKMBRGA. You are receiving this because you commented.
Yeah that makes sense. That's a cool idea, I didn't think about making the implementation always return the value by default, then optionally supress it in the object. Thanks!
Tangential to design but wanted to note that once this gets built out, we're free to deprecate some of the awkward forwarding behavior (e.g. divide_loop
where the divided loop forwards to the outermost loop). That was only originally present because we didn't have a good way to access newly created cursors.
The last piece of this design is what should be returned by the operations. I think we need at least a few policies so that there is some consistency in what we return. Here is an attempt at a policy and what it translates to for the operations:
Let's see what this translates to for the scheduling operations: | Operation | Returned Cursors | Discussion | |
---|---|---|---|---|
simplify | - A list of cursors too all updated expressions | This is according to (3). Is this an overkill? Are these cursors really useful? I think it will also complicate the implementation of the operation. | ||
rename | No returned cursors | |||
make_instr | No returned cursors | |||
insert_pass | - Forwarded gap - Pass stmt cursor |
According to (1) and (3) | ||
delete_pass | - list of gaps after the statement previous to the pass | According to (3) and (6) | ||
reorder_stmts | - Block of the two statements - First statement - Second statement |
According to (1), (2) and (3). Here the input is a block, so we should return the block of the two. But also the most granular thing to return is a cursor to both statements. However, it seems weird to do so since the block is a block of the two statements. | ||
parallelize_loop | - cursor to the loop | According to (1) | ||
commute_expr | - list of all the commuted cursors | According to (1). | ||
rewrite_expr | - cursor to the new expression | According to (1). | ||
bind_expr | - alloc cursor - write cursor (the bind) - list of all accesses to the new buffer |
According to (1), (2), and (3) | ||
extract_subproc | - call cursor - the new subprocedure |
According to (1) and (3). Currently, the subprocedure is directly returned by the operation, so this changes that convention and adds it instead to the dataclass | ||
inline | - Block of the inline statements | According to (1) | ||
replace | - A cursor to the call | According to (1) | ||
call_eqv | - A cursor to the call | According to (1) | ||
set_precision | - A cursor to the allocation or argument | According to (1). | ||
set_window | - A cursor to the argument | According to (1). | ||
set_memory | - A cursor to the allocation or argument | According to (1). | ||
bind_config | - A cursor to the config write - A cursor to the config read |
According to (1) and (3). | ||
delete_config | - A cursor to the gap after the statement previous to the deletion | According to (1) and (6). | ||
write_config | - A cursor to the config write | According to (1) | ||
resize_dim, expand_dim, rearrange_dim, bound_alloc, divide_dim, mult_dim | - A cursor to the alloc - A cursor to the modified dimensions - A cursor to every access to the buffer - A cursor to the modified access expressions |
According to (1) and (3). Some of these seem like an overkill. Is it really useful to return any cursors to the accesses | ||
unroll_buffer | - A block of the created allocations - Cursors to the access |
According to (1) and (3) | ||
lift_alloc, sink_alloc, autolift_alloc | - A cursor to the allocation | According to (1) | ||
delete_buffer | - A cursor to the gap after the previous statement | According to (1) and (6) | ||
reuse_buffer | - A cursor to the reused allocation - A cursor to the gap after the statement previous to the eliminated allocation - A cursor to every new access to the reused buffer |
According to (1) and (3) | ||
inline_window | - A cursor to the gap after the statement before the window statement - A cursor to every inlined access to the previously windowed buffer. |
|||
stage_window | - alloc - load - cursor to expression passed as an argument - store |
According to (1), (2), (3). Should the load/store cursors be to the load/store statement themselves or to the outer most statement in the load/store nest (which could be a loop or a load/store statement | ||
stage_mem | - alloc - load - block - store |
Same discussion as stage_window | ||
divide_with_recompute | - outer loop - inner loop - a list of every modified index expression |
According to (1), (2), and (3). Is returning every modified index expression an overkill or useful? | ||
divide_loop | - outer loop - inner loop - guard - tail loop - A cursor to every new index expression |
According to (1) and (3) | - a cursor to the final loop - A cursor to every new index expression |
Same discussion as above. |
mult_loops | - A cursor to the first loop - A cursor to every new index expression |
Same discussion as above. | ||
join_loops | - A cursor to the loop | According to (1) | ||
join_loops | - A cursor to the first loop - A cursor to the second loop |
According to (1) and (2) | ||
shift_loop | - A cursor to the loop - A cursor to every modified index expression |
According to (1) and (3) | ||
reorder_loops | - Outer loop - inner loop |
According to (1) and (3) | ||
merge_writes | - A cursor to the final statement | According to (1) and (3) | ||
inline_assign | - A cursor to the gap - A cursor to every new access |
According to (3). | ||
lift_reduce_constant | - block cursor - A cursor to the new statement |
According to (1) and (3). | ||
fission, autofission | - A cursor to the first loop - A cursor to the second loop |
According to (1) and (3). | ||
fuse | - A cursor to the new loop | According to (1) | ||
remove_loop | - A cursor to the guard - A cursor the block |
According to (1) and (3) | ||
add_loop | - A cursor to the loop - A cursor to the guard - A cursor to the block |
According to (1) and (3) | ||
unroll_loop | - A cursor to the block | According to (1) and (3) | ||
lift_scope | - A cursor to the if statement - A cursor to the two loops (or one loop?) | Unclear here what to return in terms of loops since there could be one or two | ||
eliminate_dead_code | in case of if: the block, in case of for: there is nothing to return so just the gap | Unclear what to return here: a unifying datalcass for both, different dataclass depending on the operation, split the rewrite into two rewrites | ||
specialize | - A cursor to the if statement - A cursor to the if block - A cursor to the else block |
According to (1), (2), and (3). | ||
add_unsafe_guard | - A cursor to the guard - A cursor to the block |
According to (1) and (3) | ||
bound_and_guard | - A cursor to the loop - A cursor to the guard |
According to (1) and (3) |
I don’t have time to review this right now; but please keep bugging me until I look over it. It’s very important to me, I just have 4 other super high priority things to do by Wednesday.
On Jan 5, 2024, at 12:53 PM, Samir Droubi @.***> wrote:
The last piece of this design is what should be returned by the operations. I think we need at least a few policies so that there is some consistency in what we return. Here is an attempt at a policy and what it translates to for the operations:
The input cursor (forwarded) needs to be among the returned cursors. We return the most granular cursors for statements. For example, even if we could return a block cursor (of a fixed size) that encompasses the cursors, we would instead return a cursor to each statement instead. We return cursors to every statement/expressions added/modified. The cursors are ordered in the dataclass by sequential order of execution. In the absence of a returned cursor that does have a field in the dataclass, we assign it to an invalid cursor. Unless the field represents a collection of things, then we set it to the empty version of that collection. If a cursor is deleted, do we return a cursor to the gap (and with respect to what statement)? What if there is no previous or next statement? Let's see what this translates to for the scheduling operations:
Operation Returned Cursors Discussion simplify - A list of cursors too all updated expressions This is according to (3). Is this an overkill? Are these cursors really useful? I think it will also complicate the implementation of the operation. rename No returned cursors make_instr No returned cursors insert_pass - Forwarded gap
- Pass stmt cursor According to (1) and (3) delete_pass - list of gaps after the statement previous to the pass According to (3) and (6) reorder_stmts - Block of the two statements - First statement
- Second statement According to (1), (2) and (3). Here the input is a block, so we should return the block of the two. But also the most granular thing to return is a cursor to both statements. However, it seems weird to do so since the block is a block of the two statements. parallelize_loop - cursor to the loop According to (1) commute_expr - list of all the commuted cursors According to (1). rewrite_expr - cursor to the new expression According to (1). bind_expr - alloc cursor
- write cursor (the bind)
- list of all accesses to the new buffer According to (1), (2), and (3) extract_subproc - call cursor
- the new subprocedure According to (1) and (3). Currently, the subprocedure is directly returned by the operation, so this changes that convention and adds it instead to the dataclass inline - Block of the inline statements According to (1) replace - A cursor to the call According to (1) call_eqv - A cursor to the call According to (1) set_precision - A cursor to the allocation or argument According to (1). set_window - A cursor to the argument According to (1). set_memory - A cursor to the allocation or argument According to (1). bind_config - A cursor to the config write
- A cursor to the config read According to (1) and (3). delete_config - A cursor to the gap after the statement previous to the deletion According to (1) and (6). write_config - A cursor to the config write According to (1) resize_dim, expand_dim, rearrange_dim, bound_alloc, divide_dim, mult_dim - A cursor to the alloc
- A cursor to the modified dimensions
- A cursor to every access to the buffer - A cursor to the modified access expressions According to (1) and (3). Some of these seem like an overkill. Is it really useful to return any cursors to the accesses unroll_buffer - A block of the created allocations
- Cursors to the access According to (1) and (3) lift_alloc, sink_alloc, autolift_alloc - A cursor to the allocation According to (1) delete_buffer - A cursor to the gap after the previous statement According to (1) and (6) reuse_buffer - A cursor to the reused allocation
- A cursor to the gap after the statement previous to the eliminated allocation
- A cursor to every new access to the reused buffer According to (1) and (3) inline_window - A cursor to the gap after the statement before the window statement
- A cursor to every inlined access to the previously windowed buffer. stage_window - alloc
- load
- cursor to expression passed as an argument
- store According to (1), (2), (3). Should the load/store cursors be to the load/store statement themselves or to the outer most statement in the load/store nest (which could be a loop or a load/store statement stage_mem - alloc
- load
- block
- store Same discussion as stage_window divide_with_recompute - outer loop
- inner loop
- a list of every modified index expression According to (1), (2), and (3). Is returning every modified index expression an overkill or useful? divide_loop - outer loop
- inner loop
- guard
- tail loop
- A cursor to every new index expression According to (1) and (3) mult_loops - A cursor to the first loop
- A cursor to every new index expression Same discussion as above. join_loops - A cursor to the loop According to (1) join_loops - A cursor to the first loop
- A cursor to the second loop According to (1) and (2) shift_loop - A cursor to the loop
- A cursor to every modified index expression According to (1) and (3) reorder_loops - Outer loop
- inner loop According to (1) and (3) merge_writes - A cursor to the final statement According to (1) and (3) inline_assign - A cursor to the gap
- A cursor to every new access According to (3). lift_reduce_constant - block cursor
- A cursor to the new statement According to (1) and (3). fission, autofission - A cursor to the first loop
- A cursor to the second loop According to (1) and (3). fuse - A cursor to the new loop According to (1) remove_loop - A cursor to the guard
- A cursor the block According to (1) and (3) add_loop - A cursor to the loop
- A cursor to the guard
- A cursor to the block According to (1) and (3) unroll_loop - A cursor to the block According to (1) and (3) lift_scope - A cursor to the if statement - A cursor to the two loops (or one loop?) Unclear here what to return in terms of loops since there could be one or two eliminate_dead_code in case of if: the block, in case of for: there is nothing to return so just the gap Unclear what to return here: a unifying datalcass for both, different dataclass depending on the operation, split the rewrite into two rewrites specialize - A cursor to the if statement
- A cursor to the if block
- A cursor to the else block According to (1), (2), and (3). add_unsafe_guard - A cursor to the guard
- A cursor to the block According to (1) and (3) bound_and_guard - A cursor to the loop
- A cursor to the guard According to (1) and (3) — Reply to this email directly, view it on GitHub https://github.com/exo-lang/exo/issues/541#issuecomment-1879244671, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKZHLDSTOXXK732HRLCMIDYNBR5FAVCNFSM6AAAAABA2NWFC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZZGI2DINRXGE. You are receiving this because you commented.
So far we have been thinking of this returned object as something that holds cursors. There is a question of whether it needs to be built with the intention of potentially returning types other than cursors. For example, extract_proc
above returns the extracted subprocedure; how should that be returned when you specify rc=1
: within this dataclass or should there be three arguements returned Proc, Porc, CursorsDataclass
. Another related thing that may come up is a "success" flag. Let's say an operation doesn't throw errors when it fails, but rather reports a True/False value on whether it succeeded. How should that flag be returned? Within the dataclass or as a third argument. I don't think we have a great example of why you might want to support a success flag instead of just throwing an error, but it is just an example I thought of where an operation might want to return something other than just cursors.
(sorry for any weird formatting; composed this on my phone)
After going over this list I realize there is a useful way we can approach this: documentation.
If we force ourselves to document the behavior in a standard way then (a) we solve the missing documentation problem and (b) it should be more obvious whether that documentation makes sense.
In particular, the extract_subproc primitive reminds me that not all of our primitives strictly conform to certain “conventions” (ie what signature is expected by higher order scheduling operators). The RC feature is fundamentally about adding another such signature convention. We should document which conventions we support in which ways. That will likely streamline a lot of work. We should discuss this point in a design meeting. I will try to post to Slack.
————-
The case of simplify suggests that (3) is too strong. My opinion is simplify should not support RC.
Insert Pass should probably just return a cursor to the pass statement.
On delete pass, I think I prefer No RC support. However there is a deeper issue with the convention. What happens if you delete a pass that is the only statement inside a loop? That’s not grammatical. Does it fail? This is related in a non-obvious way to the presence of n+1 “gaps” in an n statement block, and the problem of special cases for the first and last gap. eg suppose I assume that a returned gap cursor is always attached to the preceding statement when returned from delete pass. Oh shit. This is impossible to guarantee. The opposite convention is also impossible to guarantee. So a user is highly likely to infer unreliable invariants regardless of what behavior we set. As a result, I suggest we say no RC here to prevent the otherwise unavoidable ambiguity bugs.
On reorder stmts it should just be the block cursor returned. (2 shouldn’t be absolute). There are two variants of reorder statements that should be defined in a standard library: reorder before and reorder after. These should take and return single statement cursors in the obvious way.
Parallelize loop seems fine
Commute_exprs seems fine, but maybe just don’t support RC or return the input cursor?
What is rewrite-expr again? Is this an unsafe escape hatch? (Behavior seems fine)
I 100% agree with bind_expr.
extract-subproc — see discussion at the top
in-line, replace, call-eqv— agree
Set_* — is it possible to have a cursor to an argument?
bind-Config, write Config — sure
Delete config — same issue as for deleting a pass statement
Resize dim etc — just return a cursor to the allocation. The other cursors ought to be easy to retrieve using an inspection (better name than introspection?) feature of some sort. One wants to do that in other circumstances, so better to be more idiomatic about the way to do so.
Unroll buffer — drop cursors to accesses; see above
*_alloc — agreed
Delete buffer — same issue as other deletes
Reuse buffer — should just be a cursor to the alloc
In-line window — same issue as delete. More open to the list of cursors to in-line sites in this case. Might be harder to retrieve another way? But also totally fine to just not support RC on this for consistency?
Stage window — there is no expression passed as argument? You mean the statement around which one stages? (Oh I see that I mean stage-mem) I agree with that. Again no need for every use site of the new buffer.
On *_loop ops, drop the lists of cursors to modified expressions. The two cursors to inner and outer or similar are OK. I think just an outer cursor is fine too. If there are two successive loops (not nested) then multiple cursors should absolutely be returned.
Merge writes— yup
In-line assign — same comment as for deletion
Lift-reduce constant — I don’t understand what this op is or does…
Fission, auto fission — this should return a gap cursor to the gap between the two loops as well. The cursor should be attached to the loop before or after the gap based on which statement the argument cursor was attached to.
Fuse — agreed
Remove loop — should only return one cursor
Add_loop — should only return one cursor
Unroll-loop — agreed
Lift-scope — not sure about this one. Maybe this trouble is a sign that this primitive is mis-designed.
Eliminate dead code — same comment as other deletions
Specialize — if it’s always an if block then why return the other cursors? They have obvious paths from the first cursor
Last two — same comment about gratuitous cursors to sunstructures
I have thought about extract_subproc
more and I wonder if really returning the subproc by default (the way it is done now) is necessary. Let's say it wasn't returned, I would expect to be able to write code like the following:
proc = extract_subproc(proc, stmt)
subproc = proc.forward(stmt).subproc()
and with this new feature, you would also be able to do:
proc, cursors = extract_subproc(proc, stmt, rc=True)
subproc = cursors.subproc
It generally seems like the behavior of returning the subprocedure is an artifact of the pre-cursors Exo rather than being something that is necessary.
Actually, I guess if you want to be able to schedule the subprocedure without using any cursor-related features. You do need to get the subproc somehow. Perhaps, I am wrong then.
Problem:
We have been talking about having scheduling ops support the idea of "Returning relevant cursors". For example, when using
stage_mem
, we generate an allocation (which we might wanna further schedule to for example set the memory), and at least one of a load stage or store stage:It would be convenient if scheduling operations returned cursors to those relevant pieces of the procedure. Currently the alternative to this is using the argument cursor as an anchor and using navigation to get to such cursors from the scheduling code. The goal is to offload that work from the user to the tool.
Proposals: Proposal 1: Return a dataclass algonside the proc. The dataclass contains fields which are the relevant cursors.
Definition goes along with the scheduling operations definition, serves as a documentation of what can be returned:
Usage:
Advantages:
(Procedure, dataclass of relevant cursors)
Disatvantages:
cursors.load_stage
Proposal 2:
Similar to proposal 1, but we keep the return value of primitives to be just a procedure. Then, we implement a standard library wrapper around each primitive that returns a procedure and a dataclass of relative cursors.
Advantages:
Disatvantages:
Proposal 3: Instead of returning multiple relevant cursor, we can return exactly on relevant cursor along with the procedure. In the case of
stage_mem
, this could be the new block cursor including the load stage, the original block we staged around, and the store stage.Advantages:
Disatvantages: