Open translatenix opened 2 weeks ago
Using the amends chain affects the semantics of super.
Does it? Note that a typecast results in an object with no members. (In many cases, the extra object could be optimized away, but that's a separate concern.)
I also don't think this is great for perf. Using the amends chain means that every child receives its own complete set of cached members every time, and also means computing the same value multiple times (which you've seen by the now deleted listing7.pkl).
I think this PR could implement the same optimization that exists in the current code. However, it's not clear to me that this optimization is beneficial in the real world:
cachedMembers
and checkedMembers
cachedValues
map// foo has type `Listing<String>`
foo { "element" }
In the above code, foo { "element" }
is never shared as accessing foo
returns foo { "element" } as Listing<String>
.
Hence caching values for foo { "element" }
instead of foo { "element" } as Listing<String>
has no benefit (only a cost).
I think that my proposed optimization "avoid creating an intermediate untyped listing/mapping", which isn't possible with the current data structure, is more likely to result in real-world gains. Unfortunately, I don't have access to large real-world Pkl programs to test my performance hypotheses. I've considered generating large programs, but such programs may not be indicative of real-world performance.
If we don't share cached members, every node on m1
gets executed twice; assuming that both m1
and m2
are in the path of evaluation. It also means that we double the amount of cached members, and double the allocations (more VmObjects
, VmValue
, etc).
m1: Mapping = new {
["foo"] {
bar {
baz = doSomething()
}
}
}
m2: Mapping<String, Dynamic> = m1
Here's a "real-world" ish piece of code that ends up being much more costly (run with pkl eval -m <output_dir>
and see two traces):
import "package://pkg.pkl-lang.org/pkl-k8s/k8s@1.1.1#/api/core/v1/Pod.pkl"
import "package://pkg.pkl-lang.org/pkl-k8s/k8s@1.1.1#/K8sResource.pkl"
local pods: Listing<Pod> = new {
new {
metadata {
name = trace("my-cool-pod")
namespace = "dev"
}
}
}
local allPodNames = gatherNames(pods)
function gatherNames(resources: Listing<K8sResource>) =
resources.toList().map((k) -> k.metadata.name)
output {
files {
for (p in pods) {
["\(p.metadata.name).yaml"] = p.output
}
["all-pods-names.yaml"] {
value = allPodNames
renderer = new YamlRenderer {}
}
}
}
I think the current approach probably makes the best trade-off here; we have extra fields for every VmMapping
and VmListing
, but computing every node twice is much more expensive, and, on average, it's a big perf gain.
BTW, quick note: your proposed changes introduces this regression; so we definitely have some corner cases not caught by our language snippet tests:
Given:
class Person { name: String }
people1: Listing<Person> = new {
new {
name = "Sandy"
}
}
people2: Listing<Person(name.endsWith("Smith"))> = (people1) {
[[true]] {
name = super.name + " Smith"
}
}
The changes in this PR produces:
–– Pkl Error ––
Type constraint `name.endsWith("Smith")` violated.
Value: new Person { name = "Sandy" }
9 | people2: Listing<Person(name.endsWith("Smith"))> = (people1) {
^^^^^^^^^^^^^^^^^^^^^^
at test#people2 (/Users/danielchao/code/apple/pkl/.dan-scripts/test.pkl:9)
4 | new {
^^^^^
at test#people1[#1] (/Users/danielchao/code/apple/pkl/.dan-scripts/test.pkl:4)
106 | text = renderer.renderDocument(value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
at pkl.base#Module.output.text (https://github.com/apple/pkl/blob/6a793b7c5/stdlib/base.pkl#L106)
BTW, quick note: your proposed changes introduces this regression; so we definitely have some corner cases not caught by our language snippet tests:
Fixed here: https://github.com/apple/pkl/pull/789/commits/724482dc799dfbaf19ffb992502fd1363d98ff2c
I think the current approach probably makes the best trade-off here; we have extra fields for every VmMapping and VmListing, but computing every node twice is much more expensive, and, on average, it's a big perf gain.
Your first example is a downcast caused by partial typing. Intuitively, it feels OK to me that a downcast, which shouldn't be the norm, incurs some overhead. Note that if m2
amends m1
, the current optimization won't help.
Your second example is an upcast. This case feels more relevant to me because it cannot be easily avoided. Here, a better optimization is to recognize (and cache) the fact that no type cast is required because Listing<Pod>
is a subtype of Listing<K8sResource>
.
The number of root node calls made in a limited set of circumstances (*) shouldn't be the only metric an implementation is judged by. I see the following problems with the current implementation, some of which are easier to fix than others:
VmListing
instance)VmObjectLike
/VmListingOrMapping
virtual method calls, which are best avoided in Truffle interpretersVmListing
in the common case that a property of type Listing<X>
is amended in place.
That's because the current VmListing
can't have both a type cast node and new members.typeNodeFrame
seems unnecessary
EDIT: It may be necessary in the current implementation, but isn't in my implementation.checkedMembers
seems unnecessary
I think cachedValues
could be used instead.receiver
's and owner
's delegate chain isn't sufficient.Objects.requireNonNull
as assertion in interpreter code
This is a small but unnecessary overhead. requireNonNull
is primarily intended for validating method arguments in public APIs and throws NullPointerException
.(*) direct assignment without amendment, type cast cannot be elided, many computed members, computations are expensive
Your first example is a downcast caused by partial typing. Intuitively, it feels OK to me that a downcast, which shouldn't be the norm, incurs some overhead.
I think it might be fine if it was a little bit of overhead, but, in this case, it's twice the overhead (in both space and time). That's quite a lot, and can make it hard for users to reason through the performance of a program. This means that changing or adding a type annotation, which seems quite innocent, might double the cost of something. In large codebases, this can become quite problematic.
Responding to the rest of the concerns:
- adds a lot of complexity
- adds a lot of state (5 fields and 2 maps per
VmListing
instance)
Indeed--it would be good to reduce complexity, but I think we should be sharing cached members whenever possible.
- makes many
VmObjectLike
/VmListingOrMapping
virtual method calls, which are best avoided in Truffle interpreters
Sorry; can you elaborate on "virtual method call"? Do you mean dynamic dispatch?
- makes accessing cached values more expensive
- makes iterating cached values more expensive
That's true, although, In the common case, I think this is pretty marginal. In the following code, all of the cached values are stored directly on the delegated (new Listing { 1; 2; 3 } as Listing<Int>
), rather than on the delegate (new Listing { 1; 2; 3 }
).
foo = new Listing { 1; 2; 3 } as Listing<Int>
As long as cached values are stored on the delegated, member lookups are just as cheap as they used to be.
- Can't avoid an intermediate untyped
VmListing
in the common case that a property of typeListing<X>
is amended in place. That's because the currentVmListing
can't have both a type cast node and new members.
We can probably optimize the case of new Listing<Int> { 1; 2; 3 }
even in the current implementation
typeNodeFrame
seems unnecessary EDIT: It may be necessary in the current implementation, but isn't in my implementation.
Might be unnecessary in the current implementation too; need to investigate this (thanks for pointing this out!)
checkedMembers
seems unnecessary I thinkcachedValues
could be used instead.
It's indeed not necessarily; it's there as an optimization, although, it might not be saving that much.
- skips required type casts Executing type casts in
receiver
's andowner
's delegate chain isn't sufficient.
Are you referring to https://github.com/apple/pkl/issues/785? If so, this is fixed in https://github.com/apple/pkl/pull/822.
- uses
Objects.requireNonNull
as assertion in interpreter code This is a small but unnecessary overhead.requireNonNull
is primarily intended for validating method arguments in public APIs and throwsNullPointerException
.
Good point! We should switch to assert
for these.
I think it might be fine if it was a little bit of overhead, but, in this case, it's twice the overhead (in both space and time). That's quite a lot, and can make it hard for users to reason through the performance of a program. This means that changing or adding a type annotation, which seems quite innocent, might double the cost of something. In large codebases, this can become quite problematic.
I think you may be focusing too much on this particular optimization instead of the bigger optimization picture. Anyway, I brought back the optimization and corresponding test here: 2ec4124d
Another flaw of this optimization, or at least its current implementation, is that it only works if the member is, coincidentally, first evaluated for the parent/delegate. This can be verified by changing the order of listing1
and listing2
in listing7.pkl
.
That's true, although, In the common case, I think this is pretty marginal. In the following code, all of the cached values are stored directly on the delegated (new Listing { 1; 2; 3 } as Listing
), rather than on the delegate (new Listing { 1; 2; 3 }).
I think checkedMembers
prevents values from being cached in a single place. Hence cached values can no longer be iterated by iterating an (economic) map, which is much faster than doing a map lookup for each value.
Sorry; can you elaborate on "virtual method call"? Do you mean dynamic dispatch?
Yes. Several VmObject
methods are no longer final
.
We can probably optimize the case of new Listing
{ 1; 2; 3 } even in the current implementation
Yes, but not the amendment case.
It's indeed not necessarily; it's there as an optimization, although, it might not be saving that much.
What makes checkedMembers
a (small) improvement over cachedValues
?
I think you may be focusing too much on this particular optimization instead of the bigger optimization picture. Anyway, I brought back the optimization and corresponding test here: https://github.com/apple/pkl/commit/2ec4124dd8fcefd2d98c1a5a5384d795bcf31496
Another flaw of this optimization, or at least its current implementation, is that it only works if the member is, coincidentally, first evaluated for the parent/delegate. This can be verified by changing the order of listing1 and listing2 in listing7.pkl.
We're certainly on two sides here. You are right that this optimization is only relevant for certain cases of Pkl code. But running into it would otherwise be a significant perf penalty. Doing extra evaluation is expensive, and should be avoided if possible. But thanks for adding this optimization back! I will take a look.
What makes checkedMembers a (small) improvement over cachedValues?
Going through the existing code again, I realized that I introduced checkedMembers
at first because the delegatee was not receiving its own cached members. It's actually a result of iterating on the implementation and not re-visiting old assumptions.
I marked this PR as a draft because my initial goal is to start a conversation. For details see commit messages.