Closed stereotype441 closed 1 year ago
In a sense what's going on here is that the static analysis of map patterns only matches the runtime semantics under the assumption that containsKey
and operator[]
behave consistently with one another, and if they don't, unsound behaviour can result. I think we need to fix this. Even if we decide not to precisely define how map patterns should interact with classes that break those assumptions, I think it's crucial that the runtime behaviour doesn't produce results that violate static type guarantees.
I can see two ways to fix this:
I think the first alternative is a bad approach. It would probably require changing the static analysis of map patterns so that when recursing into subpatterns, the matched value type is always considered nullable. Which means that code like this would be erroneous:
f(Map<String, int> m) {
switch(m) {
case {'foo': var x}:
print(x + 1); // ERROR: `x` has type `int?` so it needs to be null checked
}
}
Which really would be a shame.
For the second alternative, it seems like our only choice is that if the value type is non-nullable, the implementation must null-check the result of evaluating m['foo']
. In which case the containsKey
check is redundant.
Nasty! I think the second alternative is the right one. There's no way for the type system to tell that a Map
implementation implements the protocol correctly, so we'll just have to check at runtime.
I think it's worth discussing a bit what the right resolution is here. As I mentioned in comments on #2687 , I don't think this can be specified in terms of non-nullability, because you might have something of indeterminate nullability.
class A<T> {
void foo(Map<String, T> m) {
switch(m) {
case {'foo': var x}:
print(x);
default:
print("No match");
}
}
}
Given certain maps, this should print "null" ['foo']
returns null
, and given certain other maps, it should not. I think you need to specify it either as a type test against the type of the element pattern, or a cast to the type of the element pattern.
There's a secondary question I think as to whether the match should fail, or should throw, and if the former, what to do in declaration patterns. That is, given something of type Map<String, num>
for which containsKey("a")
returns true, ["a"]
returns 1.5
, containsKey("b")
returns true, and ["b"]
returns null
, what should the following instances do?
case <String, num>{'a' : var x}: // Clearly should match
case <String, num>{'a' : Object x}: // Clearly should match
case <String, num>{'a' : int x}: // Clearly should not match
var <String, num>{'a' : var x} = ...; // Clearly should match
var <String, num>{'a' : Object x} = ...; // Clearly should match
var <String, num>{'a' : int x} = ...; // Clearly a static error
case <String, num>{'b' : var x}: // No match? Or throw?
case <String, num>{'b' : Object x}: // No match? Or throw?
case <String, num>{'b' : Object? x}: // Match?
case <String, num>{'b' : int x}: // No match?
var <String, num>{'b' : var x} = ...; // Must throw
var <String, num>{'b' : Object x} = ...; // Must throw
var <String, num>{'b' : Object? x} = ...; // Match?
var <String, num>{'b' : int x} = ...; // Clearly a static error.
It seems like, given a map pattern element 'foo' : T x
where T
may be inferred or explicit and where the enclosing map element type is V
we could do one of the following:
x as T
pattern (this is a no-op if T
is a super-type of V
and is non-nullable)
Object?
example above(x as V) as T
pattern
Object?
example above throw an exception.x as T
pattern
Object?
example above(var x as V) as T
pattern
Object?
example above throw an exception.T x
pattern (which is a no-op only if T
is a super-type of V
and is non-nullable)
Object?
case which matches(_ as V) && T x
pattern
I think the most consistent solution to me seems to be to treat this as an implicit cast in the declaration case (var x as T
) and as an implicit type test in the matching case (T x
). Does this sound right?
I think we need to decide on a model (a set of assumptions or laws) about maps:
m
uses a specific notion of equality (==
or identity).m.length
is equal to the number of elements in m.entries
, m.keys
, and m.values
(in particular, those iterables are finite).m.keys
are pairwise unequal according to the given equality (that is, they are all different).entry
in m.entries
, entry.key
is identical to an element in m.keys
and entry.value
is identical to at least one element in m.values
.key
in m.keys
, m.containsKey(key)
is true; for a key2
which is not equal to any element in keys
, m.containsKey(key2)
is false.key
in m.keys
, m[key]
returns a value
such that m.entries
contains an object created as MapEntry(key, value)
.For soundness, we would of course need to assume that any or all of these laws are violated by any particular map (which isn't known to be an instance of a platform type), but we'll have some clear answers to questions which are otherwise a bit fuzzy:
length
is negative, keys
does not terminate, etc.).containsKey
always returns true).... and possibly more.
The point is that if we can find a good, concise, sufficient set of laws to convince ourselves that the specified semantics will yield the intuitively correct results with "law-abiding maps", then the semantics is on a more firm ground, and developers will know (explicitly) that an infinite map is simply not supported.
@leafpetersen wrote:
That is, given something of type
Map<String, num>
for whichcontainsKey("a")
returnstrue
,["a"]
returns1.5
,containsKey("b")
returns true, and["b"]
returnsnull
, what should the following instances do?
This map breaks some of the laws: ["b"]
can't return null, because containsKey["b"]
is true.
Rather than trying heroically to do meaningful things with an ill-behaving map, I'd recommend that we just decide on something that will work well with law-abiding maps, and then the match/no-match behavior of this map follows mechanically. It doesn't need to make sense what we do with this map, because it's a bad map. ;-)
But it does need to make sense what we do with law-abiding maps. I think that's already true for all proposals.
So, for example: We trust containsKey
to deliver the correct status (key present, key not present), and then we insist that []
yields an object of the Value
type for a key which is present (that just means: not null):
case <String, num>{'a' : var x, ...}: // Match, `x` bound to 1.5.
case <String, num>{'a' : Object x, ...}: // Ditto.
case <String, num>{'a' : int x, ...}: // No match, `int x` is a type test that fails.
var <String, num>{'a' : var x, ...} = ...; // Completes normally, binds `x` to 1.5.
var <String, num>{'a' : Object x, ...} = ...; // Ditto.
var <String, num>{'a' : int x, ...} = ...; // Static error, type test not allowed in declaration.
case <String, num>{'b' : var x, ...}: // Throws because `['b']` must return a `num`.
case <String, num>{'b' : Object x, ...}: // Ditto.
case <String, num>{'b' : Object? x, ...}: // Ditto.
case <String, num>{'b' : int x, ...}: // Ditto.
var <String, num>{'b' : var x, ...} = ...; // Ditto.
var <String, num>{'b' : Object x, ...} = ...; // Ditto.
var <String, num>{'b' : Object? x, ...} = ...; // Ditto.
var <String, num>{'b' : int x, ...} = ...; // Static error, type test not allowed in declaration.
@leafpetersen
There's a secondary question I think as to whether the match should fail, or should throw, and if the former, what to do in declaration patterns. That is, given something of type
Map<String, num>
for whichcontainsKey("a")
returns true,["a"]
returns1.5
,containsKey("b")
returns true, and["b"]
returnsnull
, what should the following instances do?
Nit: your examples should all contain , ...
to avoid a length check that would fail. Assuming that, I agree with all the ones you marked "clearly" or "must". My personal opinion about the others:
case <String, num>{'a' : var x, ...}: // Clearly should match
case <String, num>{'a' : Object x, ...}: // Clearly should match
case <String, num>{'a' : int x, ...}: // Clearly should not match
var <String, num>{'a' : var x, ...} = ...; // Clearly should match
var <String, num>{'a' : Object x, ...} = ...; // Clearly should match
var <String, num>{'a' : int x, ...} = ...; // Clearly a static error
case <String, num>{'b' : var x, ...}: // No match (1)
case <String, num>{'b' : Object x, ...}: // No match (2)
case <String, num>{'b' : Object? x, ...}: // No match (3)
case <String, num>{'b' : int x, ...}: // No match (4)
var <String, num>{'b' : var x, ...} = ...; // Must throw
var <String, num>{'b' : Object x, ...} = ...; // Must throw
var <String, num>{'b' : Object? x, ...} = ...; // Throw (5)
var <String, num>{'b' : int x, ...} = ...; // Clearly a static error.
The principles I'm using to answer the question are:
m.length
satisfies the appropriate constraint, and for each key k
, m.containsKey(k)
is true
and m[k] is m.runtimeType.value
, then matching must proceed to check the subpattern against the value m[k]
. Conversely, if there is a key for which m.containsKey(k)
is false
and m[k] == null
, then matching must fail.
m[k] is m.runtimeType.value
I mean "the runtime type of the value returned by operator[]
is a subtype of V
, where the runtime type of the map implements Map<K, V>
".<K, V>{k: subpattern, ...}
should cause subpattern
to be analyzed with a matched value type of V
. At runtime, if <K, V>{k: subpattern, ...}
matches, the value passed into the subpattern matcher should have a runtime type that is a subtype of V
.containsKey
and operator[]
if calling just one of them would be sufficient for well-behaved maps. I'm ok if poorly-behaved maps do weird things as a result, provided that soundness is preserved.My reasoning is:
x
should be num
. So we can't match, because that would assign x = null
, which would be unsound. Therefore it's insufficient to rely on containsKey
to determine what to do; we must check the value returned by ['b']
. Furthermore, checking the value returned by ['b']
is sufficient to ensure correct behaviour for well-behaved maps, so for efficiency we should not also check containsKey
. Therefore, we can't tell whether the key 'b'
is absent or the map is poorly behaved, so the match should fail; it should not throw.x = null
, however, that would violate my requirement that "if <K, V>{k: subpattern, ...}
matches, the value passed into the subpattern matcher should have a runtime type that is a subtype of V
" (which I think is an important principle to hang on to). Besides, even if we decided that passing null
to the subpattern matcher was ok, we would still have to do both operator[]
and containsKey
to distinguish the poorly-behaved map from a map that simply didn't contain key 'b'
, so for efficiency it's still better for the match to fail.operator[]
, however to me the conclusion is the same: for efficiency, we don't want to also check containsKey
, therefore we can't distinguish the poorly-behaved map from a map that simply didn't contain key 'b'
, so the match should fail.'It seems like, given a map pattern element
'foo' : T x
whereT
may be inferred or explicit and where the enclosing map element type isV
we could do one of the following:
In a declaration context:
Treat this as an
x as T
pattern (this is a no-op ifT
is a super-type ofV
and is non-nullable)This allows the
Object?
example aboveTreat this as an
(x as V) as T
patternThis makes the
Object?
example above throw an exception.In a matching context:
Treat this as an
x as T
patternThis allows the
Object?
example aboveTreat this as an
(var x as V) as T
patternThis makes the
Object?
example above throw an exception.Treat this as an
T x
pattern (which is a no-op only ifT
is a super-type ofV
and is non-nullable)All of the matches above fail to match except the
Object?
case which matchesTreat this as an
(_ as V) && T x
patternAll of the matches above throw a cast failure
I think the most consistent solution to me seems to be to treat this as an implicit cast in the declaration case (
var x as T
) and as an implicit type test in the matching case (T x
). Does this sound right?
Personally I'm not a fan of the casting approach. I feel like we have a very consistent and intuitive notion that in a declaration context, what happens at runtime is that you try to match the pattern, and if the match fails you throw an exception. That doesn't require any casting per se (although in some cases it behaves equivalently to a cast; in others not). The additional exceptions that you get from the casting approach don't carry user benefit IMHO, because they carry an efficiency cost (in the need to call both containsKey
and operator[]
, and they only affect poorly-behaved maps, and I don't think we should be paying an efficiency cost in order to get any particular behaviour from poorly-behaved maps.
Personally I'm not a fan of the casting approach. I feel like we have a very consistent and intuitive notion that in a declaration context, what happens at runtime is that you try to match the pattern, and if the match fails you throw an exception. That doesn't require any casting per se (although in some cases it behaves equivalently to a cast; in others not). The additional exceptions that you get from the casting approach don't carry user benefit IMHO, because they carry an efficiency cost (in the need to call both
containsKey
andoperator[]
, and they only affect poorly-behaved maps, and I don't think we should be paying an efficiency cost in order to get any particular behaviour from poorly-behaved maps.
@stereotype441 I don't think I understand what you're saying here. The only alternative to some kind of casting in the general case for declarations (that I see) is to require the static analysis to infer a nullable type, which I think we all agreed we don't want. The two choices that I've shown above are to either cast to the map element type (and then separately to the individual field type), or to cast directly to the individual field type - with the difference between the two being visible when the individual field type admits null but the map element type does not. (You can, of course, strength reduce that cast to a simpler operation in some cases, but in general it has to be a cast). Am I missing something?
I'm pretty much in agreement with Paul. My priorities for the behavior here are:
It must be sound. We can't match if v[k]
returns a value whose runtime type isn't an instance of V
.
Whenever the language says a value has some static type, at runtime the value must be an instance of that type even if allowing another type wouldn't cause problems. In particular, a map's subpattern has an incoming matched value type of V
. If the map can't produce a value of type V
, then it shouldn't even get to the subpattern. This is true even if the subpattern itself doesn't care about the value's type, as in the <String, num>{'b' : Object? x}
example. This should not match.
It should be efficient. Almost all maps users will actually use are well-behaved and we shouldn't pointlessly waste CPU cycles detecting misbehaving ones.
If the map is misbehaving, we should let the user know that fact eagerly by throwing if possible. A misbehaving map is a programmatic error and we want to tell them that instead of having its misbehavior masked as either matching or not matching. But note that this is a lower priority than the previous point.
Based on those, I think the behavior should be:
If the map is known to have a non-nullable value type, then we match when v[k]
returns a non-null value (which will thus have type V
) and not otherwise. In a declaration context, that match failure becomes an exception as usual. We make no promises to detect a misbehaving map.
If the map does or may have a nullable value type, then we:
Fail the match if containsKey()
returns false. This failure becomes an exception in a declaration context as usual.
Match if v[k]
returns an instance of V
.
Throw otherwise. Note that we always throw and don't treat this as match failure. We can only get here if the map is misbehaving so we can throw an exception specific to that failure.
I'm fine with specifying this set of method invocations explicitly instead of the current spec which allows implementations leeway.
I talked this over with @leafpetersen and @stereotype441 and I think we have a proposal that we all like. It's mostly the behavior in my last comment except that instead of throwing in 2.iii, we fail the match. I'll try to explain and they can tell me what I get wrong.
We tell if a key is present and the subpattern should be matched by using:
map[k] is V && map.containsKey(k)
The is V
is necessary for soundness even with poorly behaved Map implementations. Bundling both map protocol operations into a single expression has an appealing simplicity. Based the type of V
and the expected behavior between []
and containsKey()
on well-behaved maps, we can then potentially simplify the expression.
If you work through the combinations, here's how they behave:
If V
is a non-nullable type and map[k]
returns null
, then the is V
evaluates to false
. According to the map protocol that means the key must be absent, so there's no need to call containsKey()
at all.
If V
is a non-nullable type and map[k]
returns something other than null
, then the is V
must evaluate to true
. According to the map protocol that means the key must be present, so there's no need to call containsKey()
at all.
If V
is a nullable or potentially nullable type then is V
will always evaluate to true
since V?
includes all values of V
and Null
and V
accepts exactly those. Therefore we always need to call containsKey()
since map[k]
doesn't let us distinguish between absent and present keys.
If the map is poorly behaved and map[k]
returns a non-null
(and thus V
) value and containsKey()
returns false
, it will consider containsKey()
the source of truth and discard the value.
If the map is poorly behaved, V
is non-nullable, map[k]
returns null
, and containsKey()
would return true
, it won't detect that since it short-circuits when is V
evaluates to false
. In the interest of performance, it doesn't try to detect the misbehavior.
If the map is poorly behaved, V
is nullable or potentially nullable, map[k]
returns null
, and containsKey()
would return true
, then there's no way to tell if the map is poorly behaved since a map[k]
result of null
and a containsKey()
of true
is a valid behavior when the key is present and the value is null
.
That behavior is efficient for well-behaved maps and still sound for poorly behaved ones, so I think it's a good choice. Concretely, that means the runtime behavior for matching an entry in map
is:
Evaluate the key expression
to k
.
If map
is known to have a non-nullable value type V
:
Evaluate map[k]
to v
.
If v
is not an instance of V
(i.e. it is null
), the match fails.
Otherwise, match v
against this entry's value subpattern.
Else, V
is nullable or potentially nullable:
If map.containsKey(k)
returns false
, the match fails.
Otherwise, evaluate map[k]
to v
and match v
against this entry's value subpattern.
Note that we no longer give the compiler discretion over whether and which order to call the methods. Based on the static type of V
exactly one of two explicit desugarings is chosen.
For map patterns in declaration contexts, if the match fails for any reason, that becomes a runtime exception.
We tell if a key is present and the subpattern should be matched by using:
map[k] is V && map.containsKey(k)
Just to be clear, this is not the normative semantics, since this requires that containsKey
be called even when V
is non-nullable and map[k]
returns a non-null value. But I agree that it gives a good motivation for understanding the semantics at the bottom in terms of optimizations on well-behaved maps.
The semantics at the bottom in the numbered steps gives the normative semantics I would expect.
Just to be clear, this is not the normative semantics, since this requires that
containsKey
be called even whenV
is non-nullable andmap[k]
returns a non-null value.
That's right. Sorry I wasn't clearer.
@munificent
I talked this over with @leafpetersen and @stereotype441 and I think we have a proposal that we all like. It's mostly the behavior in my last comment except that instead of throwing in 2.iii, we fail the match. I'll try to explain and they can tell me what I get wrong.
We tell if a key is present and the subpattern should be matched by using:
map[k] is V && map.containsKey(k)
The
is V
is necessary for soundness even with poorly behaved Map implementations. Bundling both map protocol operations into a single expression has an appealing simplicity. Based the type ofV
and the expected behavior between[]
andcontainsKey()
on well-behaved maps, we can then potentially simplify the expression.If you work through the combinations, here's how they behave:
- If
V
is a non-nullable type andmap[k]
returnsnull
, then theis V
evaluates tofalse
. According to the map protocol that means the key must be absent, so there's no need to callcontainsKey()
at all.- If
V
is a non-nullable type andmap[k]
returns something other thannull
, then theis V
must evaluate totrue
. According to the map protocol that means the key must be present, so there's no need to callcontainsKey()
at all.- If
V
is a nullable or potentially nullable type thenis V
will always evaluate totrue
sinceV?
includes all values ofV
andNull
andV
accepts exactly those. Therefore we always need to callcontainsKey()
sincemap[k]
doesn't let us distinguish between absent and present keys.
I believe the first half of this is not correct ("If V
is a nullable or potentially nullable type then is V
will always evaluate to true
..."). For example:
void f<K, V>(Map<K, V> m) {
// `V` is potentially nullable here
print(m[0] is V);
}
main() {
f<String, int>({});
}
prints false
.
Perhaps what you meant was something like this instead? (Changes in bold)
V
is a nullable or potentially nullable type and is V
might evaluate to true
even if the key is absent, since the actual reified type of V
might permit null
. Therefore we always need to call containsKey()
since map[k]
doesn't let us distinguish between absent and present keys.
- If the map is poorly behaved and
map[k]
returns a non-null
(and thusV
) value andcontainsKey()
returnsfalse
, it will considercontainsKey()
the source of truth and discard the value.
I assume we're still talking about the case where V
is a nullable or potentially nullable type, right? Because earlier you said that if V
is a non-nullable type, and map[k]
returns something other than null
, then there is no need to call containsKey()
.
- If the map is poorly behaved,
V
is non-nullable,map[k]
returnsnull
, andcontainsKey()
would returntrue
, it won't detect that since it short-circuits whenis V
evaluates tofalse
. In the interest of performance, it doesn't try to detect the misbehavior.- If the map is poorly behaved,
V
is nullable or potentially nullable,map[k]
returnsnull
, andcontainsKey()
would returntrue
, then there's no way to tell if the map is poorly behaved since amap[k]
result ofnull
and acontainsKey()
oftrue
is a valid behavior when the key is present and the value isnull
.
I don't think this is exactly right. If the map is poorly behaved, V
is potentially nullable, map[k]
returns null
, and containsKey()
would return true
, we can still get some information by testing map[k] is V
, since this test makes use of the reified type of V
. That is, if map[k] is V
is false, we know that the map is poorly behaved. If map[k] is V
is true, then the map is well-behaved.
That behavior is efficient for well-behaved maps and still sound for poorly behaved ones, so I think it's a good choice. Concretely, that means the runtime behavior for matching an entry in
map
is:
- Evaluate the key
expression
tok
.If
map
is known to have a non-nullable value typeV
:
- Evaluate
map[k]
tov
.- If
v
is not an instance ofV
(i.e. it isnull
), the match fails.- Otherwise, match
v
against this entry's value subpattern.Else,
V
is nullable or potentially nullable:
- If
map.containsKey(k)
returnsfalse
, the match fails.- Otherwise, evaluate
map[k]
tov
and matchv
against this entry's value subpattern.
This is unsound. If V
is potentially nullable, and its reified type is non-nullable, and the map is poorly-behaved, then map.containsKey(k)
might return true
and map[k]
might return null
, in which case it is not ok to match v
against this entry's value subpattern, because that subpattern might be V x
, which would result in x
being assigned a value of null
even though the reified type of V
doesn't accept null
.
Here's an example of how that unsoundness could be leveraged to cause a crash:
V f<K, V>(Map<K, V> m) {
// `V` is potentially nullable here, so under your proposal we don't do an `is` check
var {'k': V x, ...} = m;
return x;
}
class PoorlyBehavedMap implements Map<String, int> {
operator [](String k) => null;
containsKey(String k) => true;
}
main() {
String s = f<String, int>(PoorlyBehavedMap());
// `s` is now `null` despite having a non-nullable type
print(s.length); // crash
}
I think what we need to do instead (and what I thought we agreed to in our meeting, but maybe I was projecting my own understanding of the world) is something more like this:
expression
to k
.map
is known to have a non-nullable value type V
(we can elide the containsKey
check):
map[k]
to v
.v
is not an instance of V
(i.e. it is null
), the match fails.v
against this entry's value subpattern.V
is nullable (we can elide the is
check):
map.containsKey(k)
returns false
, the match fails.map[k]
to v
and match v
against this entry's value subpattern.V
is potentially nullable (we can't elide either check):
map[k]
to v
.v
is not an instance of V
(i.e. it is null
and the reified type of V
is non-nullable), then the match fails.map.containsKey(k)
returns false
, the match fails.v
against this entry's value subpattern.Side note: in https://github.com/dart-lang/language/issues/2671 we discussed the fact that if the subpattern is a wildcard pattern, this is still not as efficient as it could be. If we wanted to fix both issues in one fell swoop, I believe we could do so by inserting, before step 2., the following:
T
, and V <: T
(we don't need the value at all):
map.containsKey(k)
returns false
, the match fails.Agree that "potentially nullable" needs to be treated worst-case, which is neither as definitely nullable or definitely non-nullable, which can both be used for optimizations.
The failure case we are dealing with is
map[k] == null && map.containsKey(k) == true && null is! V
which is a semantic error in a misbehaving map, but one that the type system doesn't prevent.
The other possible misbehavior/inconsistency is map[k] != null && map.containsKey(k) == false
.
There is no null-soundness issue with that, whether we decide to process the value or not. It's a logical error, but it's not clear whether []
or containsKey
should be the source of truth. The map has a value, but not a key, which is just not consistent with being a Map
, which is defined as a finite set of key/value pairs.
Using map[k] is V
as a check looks nice. It's very optimizable, but still guarantees soundness if we choose to use the value.
My problem with it is that it doesn't distinguish the nonNullValue is V
from null is V
in the cases where V
is nullable. That's why we still need to call containsKey
afterwards, because the value might still be null
.
If we try to dodge type tests on non-null
values, which are bound to succeed, then we get:
let var v = map[k] in
(v != null || v is V) && map.containsKey(k)
I would still prefer to avoid calling containsKey
if the value is non-null
. For speed!
That suggests moving the containsKey
check into only the null
branch:
let var v = map[key] in
(v != null || (v is V) && map.contains(key))
It's still same three tests as in the dangerous failure case, and prevents unsoundly using a null
value which is not a V
, but it allows the sound misbehavior of map[k] != null && map.containsKey(k) == false
, by ignoring the containsKey
and assuming that the map does have a value for that key.
It treats []
as the source of truth, instead of using containsKey
, when the two are inconsistent. (That's also what we do for the map[k] is! V && map.containsKey(k) == true
case, so we are being consistent, and that consistency is why we can do just one check in most cases. Although it would also be a kind of consistency to treat any inconsistency as there not being an entry, which is what the map[k] is V && map.containsKey(k)
check does)
The (large!) benefit is that we should almost never call containsKey
for an actual value, because almost all map values are non-null
. We do need the containsKey
check on missing entries (unless V
is non-nullable, which is actually quite likely), and that containsKey
will return false, which will then fail the match. We do that at most once per map pattern (and very likely zero in assignment/declaration patterns).
I think that's a valid win. And why I think optimizing for accessing existing entries is likely to be the most efficient overall.
Doeing an extra containsKey
check for a non-null
value only makes a difference for misbehaving maps, which won't exist in real programs. We get zero benefit from it, neither soundness, nor any difference in behavior. It's pure overhead.
Optimization cases for v is V
(where v
is definitely null
) would still be the same:
V
is definitely non-nullable, the v is V
is definitely false, and the test reduces to v != null
.V
is definitely nullable, the v is V
is definitely true, and the test reduces to v != null || map.containsKey(k)
.V
is potentially nullable/non-nullable, the v is V
check always checks whether V
is nullable (null is V
). Implementations might be able to optimize that (say, have an "is-nullable" bit on every type representation).We can allow the implementations to do the (v is V) && map.contains(key)
in any order, if they think calling containsKey
is more efficient that checking v is V
, but I really hope the latter can be made efficient (and we cannot predict the performance of maps in general, so it'd be a tough call).
Optimizing for wildcard patterns is possible. Don't know how important it is, but it should be doable.
Maybe have sub-pattern expose what they need, and if they don't need a value at all, we can choose to generate a different late mapContainsX = ...
. We can also avoid generating a late fooBar = ...
for Foo(bar: _)
, but that declaration should be tree-shakeable anyway. The late bool mapContainsX => map.containsKey(X);
rather than late bool mapContainsX => valueX != null || valueX is V && map.containsKey(X);
is not as tree-shakeable.
The optimization only works if we don't read the same map entry in another pattern of the same switch
.
@lrhn
let var v = map[key] in (v != null || (v is V) && map.contains(key))
I like it! In fact I would suggest going one step further: due to short-circuiting, v is V
is only executed when v == null
, so we can change it to:
map[key] != null || (null is V) && map.contains(key)
(This is basically the same as what you reason about later in your comment, but it makes it explicit in the desugared code in a way that I think will be more compatible with existing back-end optimizations).
It's still same three tests as in the dangerous failure case, and prevents unsoundly using a
null
value which is not aV
, but it allows the sound misbehavior ofmap[k] != null && map.containsKey(k) == false
, by ignoring thecontainsKey
and assuming that the map does have a value for that key. It treats[]
as the source of truth, instead of usingcontainsKey
, when the two are inconsistent. (That's also what we do for themap[k] is! V && map.containsKey(k) == true
case, so we are being consistent, and that consistency is why we can do just one check in most cases. Although it would also be a kind of consistency to treat any inconsistency as there not being an entry, which is what themap[k] is V && map.containsKey(k)
check does)
Agreed. I like it. And I love how it's easy to explain to users (even though probably not very many users will care). We just say "[]
is the source of truth about whether a key is present; we only use containsKey
when []
gives an ambiguous answer."
The (large!) benefit is that we should almost never call
containsKey
for an actual value, because almost all map values are non-null
. We do need thecontainsKey
check on missing entries (unlessV
is non-nullable, which is actually quite likely), and thatcontainsKey
will return false, which will then fail the match. We do that at most once per map pattern (and very likely zero in assignment/declaration patterns).I think that's a valid win. And why I think optimizing for accessing existing entries is likely to be the most efficient overall. Doeing an extra
containsKey
check for a non-null
value only makes a difference for misbehaving maps, which won't exist in real programs. We get zero benefit from it, neither soundness, nor any difference in behavior. It's pure overhead.Optimization cases for
v is V
(wherev
is definitelynull
) would still be the same:
- if
V
is definitely non-nullable, thev is V
is definitely false, and the test reduces tov != null
.- If
V
is definitely nullable, thev is V
is definitely true, and the test reduces tov != null || map.containsKey(k)
.- If
V
is potentially nullable/non-nullable, thev is V
check always checks whetherV
is nullable (null is V
).
With my suggestion to change from v is V
to null is V
, this becomes:
V
is definitely non-nullable, the null is V
is definitely false, and the test reduces to v != null
.V
is definitely nullable, the null is V
is definitely true, and the test reduces to v != null || map.containsKey(k)
.V
is potentially nullable/non-nullable, the null is V
check always checks whether V
is nullable. This is great. It's got the performance characteristics I was hoping for, and I love how all of these optimizations are now based simply on the standard short-cutting behaviour of &&
and ||
. This really reassures me that there aren't going to be any subtle soundness issues lurking, and that compiler implementation details won't have unintended user-visible consequences.
Implementations might be able to optimize that (say, have an "is-nullable" bit on every type representation).
True, and even if they can't, rewriting the check to explicitly say null is V
means that there's still a good chance that this "is" check will get the benefit of other optimizations like common subexpression elimination and hoop hoisting.
We can allow the implementations to do the
(v is V) && map.contains(key)
in any order, if they think callingcontainsKey
is more efficient that checkingv is V
, but I really hope the latter can be made efficient (and we cannot predict the performance of maps in general, so it'd be a touch call).
Personally I would rather not allow the implementation to swap the order of the &&
. I love that we've got a formulation now that is consistent, well-defined, doesn't use unnecessary extra map operations, and doesn't leak any compiler implementation details to the user. That's more than I'd previously thought possible, and I don't think it's worth introducing the possibility of leaking compiler implementation details for such a small potential efficiency benefit.
Optimizing for wildcard patterns is possible. Don't know how important it is, but it should be doable. Maybe have sub-pattern expose what they need, and if they don't need a value at all, we can choose to generate a different
late mapContainsX = ...
. We can also avoid generating alate fooBar = ...
forFoo(bar: _)
, but that declaration should be tree-shakeable anyway. Thelate bool mapContainsX => map.containsKey(X);
rather thanlate bool mapContainsX => valueX != null || valueX is V && map.containsKey(X);
is not as tree-shakeable. The optimization only works if we don't read the same map entry in another pattern of the sameswitch
.
I suppose we could, but honestly I don't think it's worth it anymore. We've already got the number of map operations down to one in virtually every case (the only case where we do two operations is if []
returns null
and it's not clear from the types what that means). Optimizing further for wildcard patterns would not gain us very much, and we would lose the benefit of being able to explain the behaviour simply as "operator[]
is the primary source of truth".
This is great. It's got the performance characteristics I was hoping for, and I love how all of these optimizations are now based simply on the standard short-cutting behaviour of
&&
and||
. This really reassures me that there aren't going to be any subtle soundness issues lurking, and that compiler implementation details won't have unintended user-visible consequences.
One further thought: assuming we go with this approach, since all the optimizations just follow straightforwardly from the standard short-cutting semantics of &&
and ||
, there's no longer any need to spell the optimizations out in great detail in the spec. We could just replace step 4 of the runtime semantics for Map with something like:
p
, in source order:
expression
to k
.v[k]
to v2
.v2 != null || (null is V) && map.contains(k)
. If this evaluates to false
, the map does not match. Note that this check should be fairly efficient in practice, due to the short-cutting semantics of ||
and &&
. In particular, if V
is known at compile-time to be non-nullable, this simplifies to v2 != null
, and if V
is known at compile-time to be nullable, this simplifies to v2 != null || map.contains(k)
. v2
against this entry's value subpattern. If it does not match, the map does not match.The one "advantage" of keeping it as v2 != null || (v2 is V) && map.contains(k)
, instead of replacing v2
with null
in the test, is that this expression promotes v2
to V
as part of normal semantics.
If we change it to null is V
, we have to magically do an "unsafe" cast of the v2
variable to V
when v2
contains null
. It is safe, because we know we checked the same value could be promoted, but we need to step outside of the language to explain how it works. So, just for ease of specification, I'd keep the v2 is V
.
(Implementers can obviously do whatever is easiest for them, they don't have to specify why it works.)
The one "advantage" of keeping it as
v2 != null || (v2 is V) && map.contains(k)
, instead of replacingv2
withnull
in the test, is that this expression promotesv2
toV
as part of normal semantics.If we change it to
null is V
, we have to magically do an "unsafe" cast of thev2
variable toV
whenv2
containsnull
. It is safe, because we know we checked the same value could be promoted, but we need to step outside of the language to explain how it works. So, just for ease of specification, I'd keep thev2 is V
. (Implementers can obviously do whatever is easiest for them, they don't have to specify why it works.)
I see the benefit you're talking about in terms of intuitive explanatory power. As a person reading the spec and trying to understand why it is sound to assume that v2
has type V
, it does aid my intuition to see v2 is V
rather than null is V
. In fairness, though, I think it's a stretch to say that this expression promotes v2
to V
as part of "normal semantics", because Dart's flow analysis semantics actually aren't that clever. For example this program:
f<K, V>(Map<K, V> map, K k) {
var v2 = map[k];
if (v2 != null || (v2 is V) && map.containsKey(k)) {
V v3 = v2;
}
}
main() {}
produces output:
../../tmp/proj/test.dart:4:12: Error: A value of type 'V?' can't be assigned to a variable of type 'V' because 'V?' is nullable and 'V' isn't.
V v3 = v2;
^
So regardless of whether we say v2 is V
or null is V
, I think the spec should include an explanatory comment pointing out that there's no soundness hole here, because v2 != null || (v2 is V) && map.containsKey(k)
guarantees that v2
has type V
.
On the flip side of things, if we specify it as null is V
, it's much more intuitively obvious how to implement it in an efficient way, because it's clear that:
V
is known at compile time to be a non-nullable type, null is V
is equivalent to false
(and thus the containsKey
call can be elided)V
is known at compile time to be a nullable type, null is V
is equivalent to true
(and thus the is
check can be elided)V
has interminate nullability, it's obvious that:
null is V
can be computed as a single bit lookup on the type V
rather than a general puprose is
check,null is V
checks can be coalesced into a single check by common subexpression elimination,null is V
check can be hoisted out of the loop.Whereas if we specify it as v2 is V
, none of that is obvious unless you think carefully about the fact that, since v2 is V
appears to the right of v2 != null ||
, at the point where the is
check happens, v2
is known to be null
.
Now, @chloestefantsova is almost certainly going to be the person doing the implementation, and she's very smart. So she will probably remember this conversation and implement it in a suitably optimized fashion regardless of how we specify it. But I don't want to make her job any harder than we have to. And I don't want to make things more difficult for any other programmers that follow in her footsteps years from now, trying to track down some subtle compiler bug, when they expect to see v2 is V
in the generated code and don't find it.
So IMHO, on balance, it's better to specify it as null is V
and have an explanatory comment explaining why this choice is sound.
OK, I think I mostly followed that. I took another stab at the PR: https://github.com/dart-lang/language/pull/2687
Possibly related to https://github.com/dart-lang/language/issues/2671.
Consider the following code:
Under the current patterns spec, the static analysis of
case {'foo': Object x}
operates as follows:M
isMap<String, int>
.K=String
,V=int
, andC=String
. So the required type of the map pattern isMap<String, int>
.Object x
is type checked usingV=int
as the matched value type.Object x
has a type annotation, its required type is that type, namelyObject
.And the runtime semantics of the match are:
m
since the matched value's static type is a subtype of the pattern's required type (see Pointless type tests and legacy types)l
be the length of the mapm
, determined by callingm.length
.l != 1
then the match fails.k
be the result of evaluating the key'foo'
m.containsKey(k)
. If this returnsfalse
, the map does not match.v = m[k]
and proceed to matchObject x
againstv
.v
since the matched value's static type (int
) is a subtype of the variable pattern's required type (Object
).x = v
.Note that at no point was
x
checked to see if it wasnull
. Which means that if we inject the following nefarious class intof
:Then
g
will be called with an argument ofnull
, in violation of soundness.Side note: as discussed in https://github.com/dart-lang/language/issues/2671, the spec says that the compiler may perform steps 5 and 6 in any order, and may elide either if it determines that doing so will produce the same result. It is also allowed to assume that
So the steps above are allowed (but not required) to be optimized to this (differences in bold):
m
since the matched value's static type is a subtype of the pattern's required type.l
be the length of the mapm
, determined by callingm.length
.l != 1
then the match fails.k
be the result of evaluating the key'foo'
v = m[k]
. Ifv == null
, the map does not match.Object x
againstv
.v
since the matched value's static type (int
) is a subtype of the variable pattern's required type (Object
).x = v
.Interestingly, this "optimized" set of steps doesn't have the same soundness problem. That seems bad. We shouldn't rely on optional optimizations to keep programs sound.