Consider treating legacy types as nullable instead of non-nullable in opted in libraries

dart-lang / language

Design of the Dart language

Other

2.67k stars 205 forks source link

Consider treating legacy types as nullable instead of non-nullable in opted in libraries #992

Closed jakemac53 closed 4 years ago

jakemac53 commented 4 years ago

From an opted in library today all legacy types are effectively treated as non-nullable. While convenient, this is incorrect in all cases where the legacy api is intended to be nullable. This is a huge amount of real world code that will be incorrect.

This gets particularly bad when type inference erases the legacy type during assignment:

// x is inferred as non-nullable
var x = someLegacyApiThatMightReturnNull(); 
// Compiler warning about checking for null on a non-nullable type
if (x == null) ...

This ends up explicitly telling users to remove their correct code and replace it with broken code. See https://github.com/dart-lang/sdk/issues/42007 for some context and a real world example where this has already happened.

Given that legacy libraries have no way to indicate their intent (re: nullability), I would argue that opted in libraries should treat all legacy types as nullable, instead of non-nullable.

While this originally came up only in the context of type inference (because of the dangerous and incorrect warning), I would argue this extends far beyond that. All legacy types within opted in libraries should be converted to their nullable equivalents. For example:

// Today this is allowed within an opted in library:
someLegacyApi().foo;
// But instead I think this should be the following, note the extra `!`:
someLegacyApi()!.foo;

Pros

It makes it significantly safer to depend on non-migrated code from migrated code (possibly even completely sound?). (note the other way around is still not sound though)
It makes it a lot more obvious when you are interacting with a legacy api
It makes the handling of those interactions a lot more explicit
It discourages large scale non-bottom up migrations
It makes bottom-up migrations less likely to be broken when their dependencies do migrate
- For most cases it would cause warnings about unnecessary null checks, instead of errors about necessary null checks.

Cons

It is more verbose
Possible performance implications when dealing with legacy code?

cc @leafpetersen

leafpetersen commented 4 years ago

It makes it significantly safer to depend on non-migrated code from migrated code (possibly even completely sound?).

Not even remotely so, sorry! :)

From an opted in library today all legacy types are effectively treated as non-nullable.

This isn't really true. The only case that you're really talking about are places where a legacy type is used to infer a type in an opted in library. For the purposes of interacting with legacy libraries, the types are both nullable and non-nullable. This is the most significant difference between our approach and others. This is what allows you to call a legacy API with null, and also call a method the result of a legacy API.

import 'legacy.dart' as l;  // l has an API foo of type List<int> Function(int)

void test() {
  l.foo(null)[0].isEven; // No errors
}

The reason that the code above is accepted is because legacy types both accept null, but allow methods to be called on them as if they were non-nullable. This is really important. If we chose to treat them as nullable or non-nullable, out of order migration would be essentially be a non-starter.

Despite this, I did have a goal that legacy types should not leak into the types of non-legacy APIs. So for example, if I had a top level variable declared as var x = l.foo(null) given the above example, I didn't want x to end up being a symbol in an opted-in library that had a legacy type. Maybe I should have gone that way, but I didn't, and I still feel pretty good about that choice. At a minimum, someone who relies on an "opted in" library, should be able to trust that the APIs that they see actually mean what they say, and don't have some tricky legacy magic sneaking around in them.

If you accept the goal in the previous paragraph, then you either have to make it an error for a legacy type to be inferred, or you have to rewrite the legacy type in some way. We chose not to make it an error, and so we had to choose: do you rewrite it to make everything nullable, everything non-nullable, or do you try to do something weird and tricky, like making all of the covariant occurrences non-nullable and and the contra-variant occurrences nullable, or vice versa?

Note: whatever choice we make there, you can always override the choice by providing an explicit type. That said, the choice mostly boils down to convenience vs safety. Most APIs are in fact not nullable, and so it's convenient to be able to "just use" the legacy APIs as they are intended to be used. Our original design highly prioritized lowering the overhead of migrating: the idea was (and to a certain extent still is) to make it painless for each package to migrate in isolation, so that collectively all the packages would quickly just migrate organically. For various reasons though, we've been moving away from encouraging out of order migration, since it doesn't play well with the pub ecosystem.

Given that, there are two different ways I can look at it. The first is that since we're discouraging out of order migration, it's not clear that making this "safer" should be a priority - it's basically something we don't want our users to do anyway. The second is that since we're discouraging out of order migration, maybe it's ok to make it inconvenient to use (by rewriting to nullable types) since we don't expect many people to use it, and the people who do are probably power-users who want to be very careful with what they're doing.

@lrhn @munificent @eernstg thoughts on this?

jakemac53 commented 4 years ago

For the purposes of interacting with legacy libraries, the types are both nullable and non-nullable.

Yes I was not clear - this issue is really only meant to address the out types from legacy apis (return types, field types, etc). I agree that the in types (parameters, etc) on legacy apis need to be effectively nullable from an opted in library, as they may accept null by design and you can't know.

I realize this is probably more complicated than I think it is though as well - especially when it comes to function types and things like that.

That said, the choice mostly boils down to convenience vs safety. Most APIs are in fact not nullable, and so it's convenient to be able to "just use" the legacy APIs as they are intended to be used.

This is the part I disagree with personally, while most apis are in fact non-nullable there are still a ton of nullable apis out there. The compilers are now going to tell people to delete all their handling of these null values by default, indicating that they no longer need to handle that case. That seems wrong to me, and I would prefer it was more pessimistic, forcing them to add explicit null handling in, which in most cases is a single character, !.

leafpetersen commented 4 years ago

// Today this is allowed within an opted in library:
someLegacyApi().foo;
// But instead I think this should be the following, note the extra `!`:
someLegacyApi()!.foo;

I think I'm just going to have to disagree with this. You currently can call someLegacyApi().foo just fine, whether it's safe or not. When you migrate (out of order) you get no additional safety when you call into legacy APIs, but you get no less.

// x is inferred as non-nullable
var x = someLegacyApiThatMightReturnNull(); 
// Compiler warning about checking for null on a non-nullable type
if (x == null) ...

This is definitely worrisome though. I don't want the compiler actively discouraging you from doing the correct thing. I can't actually reproduce this exact example (and there are no specified errors or warnings here, I think) are you using a lint? But regardless, there are specified warnings around using ?. on a non-nullable type.

@lrhn @munificent @mit-mit any thoughts on this?

jakemac53 commented 4 years ago

I think I'm just going to have to disagree with this. You currently can call someLegacyApi().foo just fine, whether it's safe or not. When you migrate (out of order) you get no additional safety when you call into legacy APIs, but you get no less.

Isn't the entire point of NNBD to fix this problem? Once you have opted in to nnbd I think you are expecting this to now be safer. It is totally transparent to you whether a library you are interacting with is opted in or out, so it seems to me like we should try to uphold that promise of safety where possible, even if it is more verbose when interacting with legacy apis.

Essentially, I think people will be a whole lot less likely to check whether an api might return null now (via its docs), because they will expect the compiler to be doing this for them.

eernstg commented 4 years ago

Here's a similar example that causes the warning, and this is a problem that I've mentioned before as well:

// ---------- Library 'lib2.dart'.
// @dart = 2.7

int someLegacyApiThatMightReturnNull() => null;

// ---------- Library 'lib.dart'.
import 'lib2.dart';

void main() {
  // x is inferred as non-nullable
  var x = someLegacyApiThatMightReturnNull(); 
  // Compiler warning about checking for null on a non-nullable type
  x?.isEven;
}

// ---------- Library 'main.dart'
// @dart = 2.7
import 'lib.dart';

dartanalyzer (24c50fb8b2bc0c07f738b6f9e995b992afb6e6ca) responds with this message:

WARNING|STATIC_WARNING|INVALID_NULL_AWARE_OPERATOR|/usr/local/google/home/eernst/lang/dart/scratch/202006/lib.dart|9|4|2|The target expression can't be null, so the null-aware operator '?.' can't be used.

So it is indeed possible to make the compiler/analyzer complain about "unnecessary" checks if we go through an expression whose type is legacy and via a variable with an inferred (now non-nullable) type, even though those checks are indeed necessary.

We could consider whether it's possible to use a different type than simply the erased one (e.g., int* Function(int*) is erased to int Function(int)) during inference in a library with null safety.

However, the fact that we may have * on type arguments in a legacy type makes it harder to choose which types should be "erased to non-nullable" and which ones should be "erased to nullable", because a class type parameter may be used in both covariant, contravariant, and invariant positions in the class body.

Even if we would infer x above to have type int?, we wouldn't have a similarly safe erasure if the return type had been List<int*>*. If we erase this to List<int?>? and use that as the type of the variable, the tear-off of its add method would statically have signature void add(int?), which would be misleading if the actual value was created as a List<int>. (Yes, there are special rules about the run-time type of that tear-off, but that doesn't affect the static analysis here.)

So we can't really hope to "just tell the truth" about whether a given legacy type should be considered as being the nullable or the non-nullable variant of the type with null-safety.

It's not obvious to me that we can improve much on this situation: Any solution will have to handle the inherent unsoundness of mixed code and weak checking.

jakemac53 commented 4 years ago

Ah yes, generics do present a problem here. It is equally a problem though if the list is treated as a List<int> but it is intended to be a List<int?> though right? Either way ends up requiring a cast (or constructing a new list) with the actual correct type to fix the problem.

leafpetersen commented 4 years ago

Isn't the entire point of NNBD to fix this problem?

Yes, by getting everyone opted in as quickly as possible. Interacting with legacy APIs is not intended to be a long term stable state. Moreover, the goal is that if you correctly predict the nullability of upstream APIs, you should mostly not have to re-migrate. This is not the case if we treat everything as nullable. A lot of this reasoning is discussed in the roadmap doc.

eernstg commented 4 years ago

@jakemac53 wrote:

It is equally a problem if the list is treated as a List<int> but it is intended to be a List<int?>

If xs is a List<int?> or List<int*> which is typed as List<int> then xs[0] may be a direct soundness violation (we have type int and get null), and that's actually what we may have if we get it from a legacy function returning List<int*>*. But I think the point is that the soundness issues will go away with a pure null-safety world and strong checking, and it won't go away before we get to that point, no matter how much we try to fiddle with the rules.

But it could be a helpful property of the inference that produces non-nullable types that it tends to err on the side of code which is "deeply migrated" (in the sense that it uses non-nullable types frequently, and nullable types only when an API genuinely wishes to indicate that some entity is "optional"; the opposite, "shallow migration", corresponds in the extreme to adding ? to all types and ! to a large number of expressions).

munificent commented 4 years ago

I think I'm just going to have to disagree with this. You currently can call someLegacyApi().foo just fine, whether it's safe or not. When you migrate (out of order) you get no additional safety when you call into legacy APIs, but you get no less.

Isn't the entire point of NNBD to fix this problem? Once you have opted in to nnbd I think you are expecting this to now be safer.

Sure, but you're a little fuzzy about "this" in that sentence. We don't have a magic wand that can make unmigrated code become spontaneously safe. We don't know what "safe" means for that code.

Empirically, the last time I looked, about 90% of APIs (parameters and return types) were effectively non-nullable. So I think that's a good default.

It does mean that when migrated code calls into unmigrated code, null can sneak out where it's not supposed to go. But that's equally true in Kotlin with JS interop and TypeScript with, uh, lots of things. We can give you an incremental migration story, and we can give you safety, but we can't give you both at the same time. If you want safety, the best way to get there is to fully migrate your code.

It is totally transparent to you whether a library you are interacting with is opted in or out,

True. This is something where maybe we should push harder on the tooling to make it clear when you are talking to a legacy library. Fortunately, this is something we can iterate on relatively easy since tool UX changes like this aren't breaking.

jakemac53 commented 4 years ago

We don't know what "safe" means for that code.

Function types and generic types definitely make this more complicated. I wouldn't think those are the common cases people are dealing with. Maybe there are more corner cases also.

Outside of those special cases though always converting * to ? for "simple" types seems to me like it would always be safe. Or at least I don't understand the scenarios in which it isn't.

Empirically, the last time I looked, about 90% of APIs (parameters and return types) were effectively non-nullable. So I think that's a good default.

I would flip that argument around and say that this means 10% of all calls into a legacy api will get out the wrong type. They will end up with something that in the best case doesn't require a null check where it should, and in the worst case actively tells you to remove the a necessary null check.

This is similar to the semantic search results problem - if a search engine is going to directly answer a question you type in then those answers have to hit a quality bar that is far above 50%. A single wrong answer is more damaging than a hundred right answers. I think the same thing applies here, and the bar would need to be a lot higher than 90%.

To me the current situation is trading off safety for convenience, which doesn't jive with the direction of the language as a whole.

lrhn commented 4 years ago

It doesn't matter whether choosing a ?-type over a non-? type is "safer" because that's not what we are optimizing for in mixed code. We are optimizing for getting people to migrate as soon as reasonably possible.

As Leaf said above, the goal that interaction between null safe and non-migrated code is aiming for is not safety, it's easing migration. That is convenience in the short term, in order to get quicker to the end goal of safety. This problem only occurs when migrating out-of-order, where your code is being migrated, but a dependency hasn't migrated yet. That's reduces the impact of any choice we make here.

Picking the non-nullable type might be wrong in 10% of the cases. Picking the nullable type would be wrong in 90% of the cases. No amount of being short-term safer outweighs that. And you are not safe as long as you are running in unsound mode, which you are as long as any code is unmigrated.

If we make all legacy code types be treated as nullable, then you need to add ?. or ! when dealing with them. If we make them non-nullable then you don't, but you risk null-errors. Or, more realistically, you write the correct type when the default type is wrong, which will be is 10%/90% of the type.

So, whether you can write var x = legacyFunction(); 90% of the time, or you need to write int x = legacyFunction(); every time. When that legacy code is migrated, and the return type of legacyFunction becomes int (as you knew it would, that's why you accepted var x to begin with), your code still works, and you won't get any lint warnings about unnecessary types on variables.

jakemac53 commented 4 years ago

We are optimizing for getting people to migrate as soon as reasonably possible.

I suppose I just fundamentally disagree that this should be the primary goal. I would weight correctness much higher as a goal. Especially when the inconvenience we are talking about is a single character.

This is imo the implicit downcasts mistake all over again, but much more significant and widespread. This is also significantly less breaking than that because it doesn't break existing code, only migrated code.

Did implicit downcasts allow us to migrate to dart 2 easier? It likely did. Are we are still paying the costs of that choice many years later? Absolutely yes. Every single developer in the world has to explicitly disable this with an analysis option. It wasn't ultimately the correct choice, IMO, as its caused far more pain in the long term than it saved.

Picking the nullable type would be wrong in 90% of the cases.

What matters is the consequence of making the wrong choice. The consequence of making it nullable when it isn't (for types you get out of legacy apis), is extremely minimal. The code is still totally correct, it just has extra ! characters.

The safest choice is to make everything nullable (except for generics/function types as I mentioned above, those are fundamentally more problematic). As far as I know there is no soundness impact to making this choice, it is explicitly more safe.

If we make all legacy code types be treated as nullable, then you need to add ?. or ! when dealing with them.

This is exactly the point though, you should have to effectively assert the non-nullability of a type coming out of a legacy api. This should be explicit, and not implicit.

And you are not safe as long as you are running in unsound mode, which you are as long as any code is unmigrated.

You are not fully safe no, but that doesn't mean you shouldn't be more safe.

Today it feels like we are taking a huge step backwards in terms of safety. I would argue that partially migrated code is actually significantly less safe than completely legacy code. This is because the migrated libraries have removed their null checks, and are relying on the compiler to do the right thing.

The best way to fight against that is to make the compiler be as pessimistic as possible with regards to legacy apis. We already think this use case should be somewhat rare, so why not make it a bit more verbose but more safe?

When that legacy code is migrated, and the return type of legacyFunction becomes int (as you knew it would, that's why you accepted var x to begin with), your code still works, and you won't get any lint warnings about unnecessary types on variables.

This problem should be solved by packages releasing a breaking version change when they opt in to NNBD. Any migration regarding thsi should be completely trivial.

natebosch commented 4 years ago

Picking the non-nullable type might be wrong in 10% of the cases. Picking the nullable type would be wrong in 90% of the cases.

The syntax to narrow a type from nullable to non-nullable is small. The syntax to widen a type from non-nullable to nullable is larger, less intuitive, less idiomatic, and not checked statically.

When that legacy code is migrated, and the return type of legacyFunction becomes int (as you knew it would, that's why you accepted var x to begin with), your code still works, and you won't get any lint warnings about unnecessary types on variables.

Users don't write var x because they know anything specific about the return type of a function. They most often write it because they trust the compiler will tell them if they did something wrong afterwards.

The ! is a syntactic indicator that you are conveying knowledge about an API. I don't think it's great that we are using the absence of syntax, or absence of a difference against other idiomatic Dart code to convey knowledge (or an assumption) about an API.

leafpetersen commented 4 years ago

We don't have any consensus to change this, closing as not planned.

jakemac53 commented 4 years ago

Note that I did another pretty bad case of this today:

while (someLegacyApiThatMightReturnNull() != null) {
  ...
}
return ...; // Gives a dead code warning

I do really strongly disagree with the approach here but I agree the ship has sailed. So I am fine with logging my disagreement and moving on.

derolf commented 3 years ago

extension ObjectExt<T> on T {
  T? get $ => this;
}

foo.$ will widen foo be nullable.