Open Fidget-Spinner opened 11 months ago
Relevant paper, as we are partially evaluating CPython bytecode, which is closer to machine code than AST/CFG.
https://research.cs.wisc.edu/wpis/papers/oopsla15.pdf
(this is for further optimizations later)
Could this extend to the inputs? So if you write
op(_BINARY_OP_MULTIPLY_INT, (unused/1, left: &PyLong_Type, right : &PyLong_Type -- res: &PyLong_Type))
then the generator emits type guards?
then the generator emits type guards?
Yep that's the idea. You can see the finite state machine we used in our previous project here https://github.com/pylbbv/pylbbv/blob/pylbbv/Python/tier2.c#L1702. That's roughly what we want to generate, without handwriting it of course.
FWIW, I'd rather not start using a syntax based on // @...
comments. If we want these things in the DSL, let's modify the DSL parser to recognize them as proper syntax. Maybe this?
op(_BINARY_OP_MULTIPLY_INT,
(unused/1, left: &PyLong_Type, right: &PyLong_Type -- res: &PyLong_Type),
flags=[autocodegen, codegen_predicate(oparg=NB_MULTIPLY), pure])
{
}
That LGTM too. In the parser implementation, I would probably add support for trailing commas, so we can have nice PEP8-style lists.
op(_BINARY_OP_MULTIPLY_INT,
(unused/1, left: &PyLong_Type, right: &PyLong_Type -- res: &PyLong_Type),
flags=[
autocodegen,
codegen_predicate(oparg=NB_MULTIPLY),
pure,
]
)
{
}
It also makes the attributes easier to read.
Also up for discussion: how to annotate the partition effects. Right now we have two effects: OVERWRITE (set a node) and SET (set partition's value). In our tier 2 interpreter report, this was the EBNF we proposed:
local_effect := "locals" "[" expr "]" "=" local_source
local_source := localsrc_literal | localsrc_stackinput
localsrc_literal := identifier
localsrc_stackinput := "*" identifier
type_annotation := ("{" type_operation ( "," type_operation )... "}" )
| type_operation
type_operation := type_set | type_overwrite
type_set := "<<=" type_source
type_overwrite := type_source
type_source := typesrc_literal
| typesrc_locals
| typesrc_const
| typesrc_stackinput
typesrc_literal := identifier
typesrc_locals := "locals" "[" expr "]"
typesrc_consts := "consts" "[" expr "]"
typesrc_stackinput := "*" identifier
Here are some sample instructions from our project:
// Node source. Takes inspiration from dereference in C.
inst(STORE_FAST, (value --), locals[oparg] = *value) {
}
// Single partition operation.
inst(CHECK_INT, (maybe_int, unused[oparg] -- maybe_int : <<= PyLong_Type, unused[oparg])) {
}
// An example of multiple partition operations in a single instruction.
inst(CHECK_FLOAT, (maybe_float, unused[oparg] -- unboxed_float : {<<= PyFloat_Type, PyRawFloat_Type}, unused[oparg])) {
// Checks float and also unboxes it onto the stack.
}
I was wondering how you feel about the partition operations being described with *
, <<=
and {}
?
I'm afraid I didn't read your report in enough detail, I have no idea what partitions are or what they are used for! Sorry. :-(
In the first example, why is the *
needed? If we think of locals[oparg] = *value
as C code, the *
feels wrong, since locals
looks to me like an array of PyObject*
and value
is a single PyObject*
. So in C I'd just write locals[oparg] = value
. I guess this notation is meant to describe the semantics, at least for simple opcodes, instead of writing actual C code?
What is the <<=
operator supposed to do? I understand it distinguishes between two different ways a type can be applied to an I/O effect, but I don't understand the difference yet. What mathematical operation is it trying to resemble?
I guess {}
are just used for grouping multiple types -- are these unions or intersections? It seems a reasonable grouping character. I presume {foo}
and foo
mean the same thing -- why not ()
instead of {}
?
I'm afraid I didn't read your report in enough detail, I have no idea what partitions are or what they are used for! Sorry. :-(
No worries! It was admittedly quite long. Just a quick recap, our analysis algorithm partitions locals and stack entries into subsets that represent their relationship. Each partition is represented as a tree, where each stack/local entry is a node and the root node holds the overall type/constant value of the entire partition.
In the first example, why is the
*
needed? If we think oflocals[oparg] = *value
as C code, the*
feels wrong, sincelocals
looks to me like an array ofPyObject*
andvalue
is a singlePyObject*
. So in C I'd just writelocals[oparg] = value
. I guess this notation is meant to describe the semantics, at least for simple opcodes, instead of writing actual C code?
Yes it's meant to describe that "the node at locals[oparg]
is overwritten/assigned the value of the node value
"
What is the
<<=
operator supposed to do? I understand it distinguishes between two different ways a type can be applied to an I/O effect, but I don't understand the difference yet. What mathematical operation is it trying to resemble?
It is the SET operation, which set the entire partition's type by updating the root node. In the case above of CHECK_INT
, the partition the stack variable is in has its type set as PyLong_Type
. The default operation for a normal type annotation is OVERWRITE. So we will need other syntax to distinguish that since the current DSL already uses type annotations for casting.
I guess
{}
are just used for grouping multiple types -- are these unions or intersections? It seems a reasonable grouping character. I presume{foo}
andfoo
mean the same thing -- why not()
instead of{}
?
They are used to indicate multiple type operations. Indeed I also considered just plain parantheses, and I don't mind either.
I still owe you a reading of your report and the Oopsla paper you linked to. If this ever becomes urgent please ping me and I'll drop other stuff.
Thanks Guido. You don't have to read the report. I will create some slides to explain the algorithm and also the DSL changes to save everyone's time.
Having worked through your preso I think I understand a bit better, but it would be useful to work through some examples, starting with the Python code. For my own education, let me try one for each operation.
Let's start with
x = 1
y = x
In Tier 1 bytecode, that's
LOAD_CONST (1)
STORE_FAST (x)
LOAD_FAST (x)
STORE_FAST (y)
In Tier 2 it's the same, except for added SAVE_IP uops, which we'll ignore, and there's an EXIT_TRACE at the end.
Walking through these one at a time:
LOAD_CONST (1) Creates a "root node", which I understand to be a place to hold a value (here, the integer 1). Then it makes stack[0]
point to that node. (Are there non-root nodes? Somehow I only recall reading about root node objects.)
STORE_FAST (x) Makes the variable x
point to that same root node. (As an intermediate step it seems it is made to point to stack[0]
but since that is immediately popped I'm not sure what the point is.)
LOAD_FAST(x) Makes stack[0]
point to x
(which points to the root node 1
).
STORE_FAST(y) Makes the variable y
point to stack[0]
, and then (as the latter is popped) to x
instead. So now we have a chain y -> x -> root(1)
.
I'm not sure at what point the concrete uops are emitted from this. Possibly for each STORE_FAST
, since that has an observable non-stack effect? Or when we encounter the EXIT_TRACE uop?
Anyway, in this example it seems everything is considered static, since everything ends up pointing to that root node. In the next example I try to do something more interesting:
global a
x = a
y = 1
z = x + y
Now x
is dynamic and y
is static, so when we get to x + y
we have x
on the "real" stack already but y
is still virtual, and now we have to emit instructions to construct y
(in this case LOAD_CONST (1); STORE_FAST (y)
).
(Hm, this is getting long, and I haven't even gotten to the point where I understand why you are adding markup for this to the DSL. I will continue later.)
Having worked through your preso I think I understand a bit better, but it would be useful to work through some examples, starting with the Python code. For my own education, let me try one for each operation.
Oh no! That might undergo major revisions soon (mainly, adding symbolic information). Please hold off from spending too much time on it. Sorry if I wasted your time!
LOAD_CONST (1) Creates a "root node", which I understand to be a place to hold a value (here, the integer 1). Then it makes
stack[0]
point to that node. (Are there non-root nodes? Somehow I only recall reading about root node objects.)
The rest are reference nodes, which are just the pointers on the stack/locals.
STORE_FAST (x) Makes the variable
x
point to that same root node. (As an intermediate step it seems it is made to point tostack[0]
but since that is immediately popped I'm not sure what the point is.)
That just shows how we wrote the OVERWRITE operation internally.
I'm not sure at what point the concrete uops are emitted from this. Possibly for each
STORE_FAST
, since that has an observable non-stack effect? Or when we encounter the EXIT_TRACE uop?
STORE_FAST
is always emitted. The reason is that we are tracing not method-at-time, and there might be external uses of the variable outside the trace.
Now
x
is dynamic andy
is static, so when we get tox + y
we havex
on the "real" stack already buty
is still virtual, and now we have to emit instructions to constructy
(in this caseLOAD_CONST (1); STORE_FAST (y)
).
Indeed, and also CHECK_INT
and BINARY_OP_ADD_INT
Thanks, I won't spend more time on it, but your answers are helpful, and I enjoy learning about this stuff.
The current rough proposal for optimization passes is to use abstract interpretation on tier 2 uops. This makes the interpreter DSL a natural way of describing the abstract interpretation. We could extend the interpreter DSL to do two additional things:
1 Here's the
_BINARY_OP_MULTIPLY_INT
uop, annotated with type annotation information for the output:Which would potentially generate the following target for type propagation:
We plan to further experiment with this to see if we can feed information required for partial evaluation. E.g.
Would mark that this operation is "pure", and that it does not affect the static/dynamic split during partial evaluation.
2 The conversion from
BINARY_OP
to_BINARY_OP_MULTIPLY_INT
is fairly simple. The predicate is thatoparg=NB_MULTIPLY
andtype(left)=type(right)=int
. This could be how we can annotate that in the DSL:Note that we have type annotated the input types. This tells the type propagator what types we expect the stack variables to be. Note that the code generator already knows that
BINARY_OP
maps toBINARY_OP_ADD_INT
thanks to thefamily
tag in the DSL!