fusionlanguage / fut

Fusion programming language. Transpiling to C, C++, C#, D, Java, JavaScript, Python, Swift, TypeScript and OpenCL C.

https://fusion-lang.org

GNU General Public License v3.0

1.74k stars 55 forks source link

Null safety syntax #31

Open hovi opened 2 years ago

hovi commented 2 years ago

Hi! Any plans to support null safety?

I get, that default syntax implies nullable variable as in C#, but it would be nice to have a syntax for non-nullable variable type if applicable. Many languages natively support it and others support it through an interface. Worst case scenario, they can be translated to normal nullables if there is no implementation or implementation would be too complicated (for example typescript supports it, but when ran as javascript, every variable is in reality nullable anyway).

Having code nice for all platforms loses a lot of it's charm without null safety :-)

pfusik commented 2 years ago

I considered it, but have mixed feelings. It adds expressiveness at the cost of complicating the language. Let's discuss it.

Syntax

I suppose the type system should be extended with:

string? nullable string reference
C? nullable read-only object reference
C!? nullable read-write object reference
T[]? nullable read-only array reference
T[]!? nullable read-write array reference

Not sure if dynamic references should be also distinguished between nullable and non-nullable:

C#? nullable dynamic object reference
T[]#? nullable dynamic array reference

The order of ? vs ! or # is to be agreed. I prefer spelling the nullability last, because it affects the reference value while ! says what you can do after dereferencing.

A non-nullable type coerces to nullable. What about the other way? Shall we extend the syntax with explicit de-nullize ?

Foo? nr;
Foo nnr;
nr = nnr; // valid
nnr = nr; // or nnr = nr! ?
nr.Foo(); // or nr!.Foo() ?

Note ! has already two meanings: logical negation prefix operator and read-write reference type suffix. While the above introduces no grammar ambiguities, it might make the code harder to read and edit.

Target language mapping

All the targets except for OpenCL C support assert. It would be possible to emit asserts in method prologs that check for null arguments. I don't like this idea, because it makes the output much different from the Ć source.

Losing nullability information is also an option for all the targets. But it makes this feature questionable.

C

There is no standard support for null safety. There are GNU Extensions:

__attribute__((nonnull)) following the function signature says all pointer parameters are non-nullable
__attribute__((nonnull (1, 2))) (for example) following the function signature says the first and second parameter are non-nullable
__attribute__((returns_nonnull)) following the function signature says the function doesn't return NULL

C++

C++ references (T&) are non-nullable, but also immutable at the same time. There is no non-nullable mutable pointer or reference.

C

C# distinguishes between nullable and non-nullable references since 8.0 released in September 2019. It's an opt-in feature.

Java

There are multiple ways to mark the type nullability using Java annotations. I'm totally confused!

JavaScript

It's a dynamic language, hence types are not specified explicitly.

Python

Ditto, but I plan to emit type annotations. @jellonek Is it possible to express non-nullability in Python?

Swift

It has built-in support.

TypeScript

It has built-in support. @jedwards1211 to comment on if it's different from C# and Swift.

OpenCL C

Not supported.

pfusik commented 2 years ago

I'd also like to understand what's the support in languages that are to be supported in cito. For example D #34 @Teashrock

I assume this request is for reference types, which are all implicitly nullable now.

Nullable value types are a different feature that would require adding mappings to all the targets:

C++17 has std::optional
C# supports them since 2.0 (int?)
Java has boxed types (Integer) and Optional
JavaScript allows for passing null and undefined
Python allows for passing None
Swift supports them
TypeScript supports them
Unclear how to map to C and OpenCL C. Arguments could be passed by pointer.

hovi commented 2 years ago

Cool, you took quite some time thinking about it!

I believe you are putting too many things together, there are a few aspects:

Static compile checks.

So that means compiler warnings/errors when there's possibility to use variable, that can be null at the time. This usually has to have build-in support of the language or some sort of pseudo support (ex. java annotations with IDEs, that support those annotations - and you can see how messy it is). When the code compiles, it doesn't guarantee, that null cannot occur at runtime.

This is the main thing, that I value. If cito is to create multiplatform libraries, I want clear interface in my library. I want to clearly state, that if a function accepts nulls or does not. I want to clearly document, what passing nulls mean if I allow it.

Implementation null checks, where it's not allowed is another thing. I see that as implementation detail or undefined behaviour if it cannot be forced by the compiler or language. This is how it works when running transpiled javascript, same goes for kotlin on jvm, and kotlin to javascript.

Runtime checks.

Only place where to check for null at runtime is when casting rom nullable to non-nullabe (ex. ! operator). but this is basically optional functionality, that ensures fail-fast behaviour so that it is easier to trace the origin of the null. If the check is not there, program will most likely (very soon) fail on null pointer/reference. Sometimes it may be even undesirable to add some kind of assert checks to languages that do no support null safety.

Syntactic sugar

Some interfaces are there to work nicer with "nulls" or something, that represents "nothing" or "empty", but that is different from typical "null" on the language level.

Example is Optional in java. Optional is basically just normal interface, that tells you, if variable has value or not and gives you some convenient method to work with it better like of, ofNullable, orElse, orElseThrow, etc. But it does not ensure anything on the compiler level, doesn't really solve the problem.

And the worst part is, the whole optional variable itself can be null and compiler doesn't check that (correct me if I am wrong, been a while since I worked with java, maybe things changed) so overall it gives little to no value (opinionated I guess).

Otherwise smaller points

Since cito was designed with nullables first so "default" is nullable, it makes more sense to have a syntax to mark non-nullable variables. It would break existing code otherwise. Would be good to keep typical syntax with ?, but that would probably work only if you could configure compiler to let it know, how to threat "default" variable declarations.
Syntax. In kotlin, syntax is !! when casting nullable to non-nullable. It may be more readable and it's less ambiguous.
Be careful with asserts in Java, assert is actually not compiled to the final code unless explicitly turned on by the compiler (and it is turned off by default).
It's convenient to define null coalescing operator. Typically ?? (C#, JS) or ?: (kotlin).

jedwards1211 commented 2 years ago

TypeScript works the same as C#, if you've enabled strict null checking, it will emit an error if it deduces that you could dereference null, or assign a value that could be null to a non-nullable variable. Though TypeScript doesn't have a null-forgiving operator.

pfusik commented 1 year ago

This feature is now implemented. The doc is not updated yet, but basically all reference types are now explicitly marked nullable by appending a question mark:

string? s // nullable string reference
C? o // nullable read-only object reference
C!? o // nullable read-write object reference
C#? o // nullable dynamic object reference
T[]? a // nullable read-only array reference
T[]!? a // nullable read-write array reference
T[]#? a // nullable dynamic array reference
C!?[]? a // nullable read-only reference to an array of read-write object references

There are no null-forgiving or coalesce operators.

This is an incompatible change in the language, but the sooner we make it, the better. Note that C# introduced such a change in 8.0 in 2019, configurable at project scope or file scope.

Updating my own projects turned out to be very easy, because cito emits errors if you attempt to assign null to a non-nullable reference or compare it with null. This is where you add the question marks to the types. I expect that most reference types are meant to be non-nullable, so it's a better default for the language.

Here's the update for a 21 KLOC Ć project: https://sourceforge.net/p/recoil/code/ci/734ffa6193f90d6cf7c8840d41e33bdd07967767/ This is mostly adding question marks. In a few places, a method parameter was reassigned: the parameter was non-nullable but reassigned with a nullable value. The solution was to introduce a new local variable instead of reassigning the parameter.

Currently the nullability affects the Swift and TypeScript outputs only. References are transpiled to nullable only if specified as such. This is good for the library interface and makes the code look more native to Swift and TypeScript. Here's an example: https://github.com/pfusik/qoa-ci/commit/b788ce88d4fdbe6db77a81d8057f7d02b1b63f1a

In Swift the null-forgiving operator (called "unwrapping" in Swift speak) is now emitted only for nullable references. This improves readability.

Now about the problems I see with this feature:

Once you mark the reference as non-nullable, there's no default value for it. I don't consider this alone to be a problem, but Swift emits errors for non-nullable reference fields with no initializer. C# would emit warnings. @hovi Does it mean that reference fields should be defined as nullable, even if never assigned null ? What's your experience? Unfortunately, this also means that a null-forgiving operator is needed every time you read such a field. My own preference would be to be able to use non-nullable reference fields, meaning it's a compile-time error when you assign them a null. Do C# and Swift allow that?
It is unclear how to do null validation for arguments passed to Ć code from the outside. On one hand it's good to define the parameters as non-nullable. On the other hand, shouldn't we be doing at least runtime validation for languages with no non-nullable types? For example, see https://github.com/pfusik/qoi-ci/commit/8dd7f54e85a77efc07585d3767e94b3cbb6edc41 pixels and encoded where previously null-checked at runtime. When they are non-nullable, cito doesn't allow that. @hovi Any ideas how to deal with that?

The C# backend isn't yet updated and continues to emit old-style references with no question marks. Because of problem 1, I'd need to mark many references in AST as nullable or get many warnings during compilation of cito.

The Python backend would also take advantage of explicit nullability when it gets type annotations. They are of form Optional[C] or in Python 3.10+ C | None.

Waiting for your feedback!

hovi commented 1 year ago

Hi, thanks for implementing this!

Now about the problems I see with this feature:

1. Once you mark the reference as non-nullable, there's no default value for it. I don't consider this alone to be a problem, but Swift emits errors for non-nullable reference _fields_ with no initializer. C# would emit warnings.
   @hovi Does it mean that reference fields should be defined as nullable, even if never assigned `null` ? What's your experience?
   Unfortunately, this also means that a null-forgiving operator is needed every time you read such a field.
   My own preference would be to be able to use non-nullable reference fields, meaning it's a compile-time error when you assign them a null. Do C# and Swift allow that?

This is completely fine. I would advocate for not allowing non-nullable fields to be assigned or initialized with null, instead of attempting to demote them to nullable types. The errors in Swift and C# are valid, and from my perspective, this behavior is a good thing. In fact, I believe it's the best approach!

For developers, this means they must find a valid way to initialize non-nullable fields through constructors or provide appropriate default values. This approach reduces the likelihood of encountering unexpected runtime errors later due to null values. Those who prefer not to have this "restriction" can use nullable types.

2. It is unclear how to do null validation for arguments passed to Ć code from the outside. On one hand it's good to define the parameters as non-nullable. On the other hand, shouldn't we be doing at least runtime validation for languages with no non-nullable types?
   For example, see [pfusik/qoi-ci@8dd7f54](https://github.com/pfusik/qoi-ci/commit/8dd7f54e85a77efc07585d3767e94b3cbb6edc41)
   `pixels` and `encoded` where previously null-checked at runtime. When they are non-nullable, `cito` doesn't allow that.
   @hovi Any ideas how to deal with that?

There is a trade-off between fail-fast factors and speed. I personally prefer forced checks because safety is more important to me than speed most of the time. However, others might prioritize speed over safety.

One possible solution is to use asserts:

Both Java and Python have assert, which can be used for this purpose. In both cases, they throw an exception if the assert fails, and the assert behavior can be toggled (it's enabled by default in Python and disabled in Java). Many other languages have similar mechanics (according to ChatGPT, but no guarantees). Using asserts is probably the simplest approach, but it might be worth considering creating your own assert function to have full control over the behavior, exceptions thrown, messages, etc. Either way, making this functionality optional is, in my opinion, important.

Some other related thoughts:

In java there's for example guava Preconditions.checkNotNull or annotations that help with IDE hints and don't do anything at runtime (but there are multiple annotations for different IDEs although jetbrains one is the most used I think). Kotlin, when compiling to JavaScript, does not perform runtime nullability checks.

pfusik commented 1 year ago

Regarding 1:

I think that initializing all references in the constructor is generally hard. Either you need to:

Delay construction until you have all the data. You work with multiple local variables and then pass them to a constructor with many arguments. This is okay if you have 1-3 child objects. Not good if you have a dozen.
Create "null object" classes which I consider an anti-pattern.

Regarding 2:

Ć has assert which transpiles to all the targets except for OpenCL. This means cito could inject null argument checking. For Java, I'd prefer to have one standard way. It seems there isn't even a consensus whether to throw NullPointerException or IllegalArgumentException. In .NET it's clear: ArgumentNullException.

hovi commented 1 year ago

Regarding 1:

I think that initializing all references in the constructor is generally hard. Either you need to:

* Delay construction until you have all the data. You work with multiple local variables and then pass them to a constructor with many arguments. This is okay if you have 1-3 child objects. Not good if you have a dozen.

* Create "null object" classes which I consider an anti-pattern.

I agree with you. But isn't that responsibility of the programmer? This restrictive constructor is simply one single way of creating valid object that ensures creating only valid objects. The way you achieve calling this constructor is another matter. It's IMHO better for it to be hard and safe rather than easy and error-prone. Script language programmers may disagree (but then why would they use non-nullable types?).

The points you mentioned are typically solved by builder pattern, which is not your problem as language builder although you could make library or language support that makes builders easier.

Regarding 2:

Ć has assert which transpiles to all the targets except for OpenCL. This means cito could inject null argument checking. For Java, I'd prefer to have one standard way. It seems there isn't even a consensus whether to throw NullPointerException or IllegalArgumentException. In .NET it's clear: ArgumentNullException.

I think in the end it doesn't matter as much. This is a kind of exception, that you never catch for, "shouldn't ever happen" and you don't branch for it. It is only there to crash your program safely fast rather than to keep it running in an invalid state.

pfusik commented 1 year ago

It's IMHO better for it to be hard and safe rather than easy and error-prone.

I don't believe in compilers catching all programming errors. It's rather easy to spot an accidentally null field, especially if you have tests. Reasoning about a constructor taking 20 arguments is worse (I've seen that). That's not easy to refactor or write tests for.

The points you mentioned are typically solved by builder pattern

Good point. I assume the builder throws a runtime exception if it has no way to initialize a reference?

I think in the end it doesn't matter as much. This is a kind of exception, that you never catch for, "shouldn't ever happen" and you don't branch for it. It is only there to crash your program safely fast rather than to keep it running in an invalid state.

Good point, I absolutely agree!

hovi commented 1 year ago

It's IMHO better for it to be hard and safe rather than easy and error-prone.

I don't believe in compilers catching all programming errors. It's rather easy to spot an accidentally null field, especially if you have tests. Reasoning about a constructor taking 20 arguments is worse (I've seen that). That's not easy to refactor or write tests for.

I agree that compilers can't catch all programming errors. I am not sure what you are getting at now though, I am confused. I see this still as part of Nullability feature of the language that solves for me this thing "what I mark as never null is never null". This I would like to be caught by compiler fully. I don't know why should I write tests and spot nulls on my domain level code if this can be solved by me on language syntax level. The whole purpose for me is that I don't have to worry about nulls at all (at least in ideal case, there are always crazy edge cases).

The points you mentioned are typically solved by builder pattern

Good point. I assume the builder throws a runtime exception if it has no way to initialize a reference?

Yeah, typically when calling build method.

al1-ce commented 1 month ago

I'd also like to understand what's the support in languages that are to be supported in cito. For example D #34 @Teashrock

I assume this request is for reference types, which are all implicitly nullable now.

Nullable value types are a different feature that would require adding mappings to all the targets:

C++17 has std::optional

C# supports them since 2.0 (int?)

Java has boxed types (Integer) and Optional

JavaScript allows for passing null and undefined

Python allows for passing None

Swift supports them

TypeScript supports them

Unclear how to map to C and OpenCL C. Arguments could be passed by pointer.

In D basic types (int, bool) and struct instances cannot be null. Nullable types in D (as per my knowledge):

Class instance (MyClass c = null;)
Pointers (int* p = null;)
Arrays/Associative Arrays (int[] a = null; and int[bool] aa = null;)
Function pointers (void function(int) f = null;)
Delegates (void delegate(int) d = null;)

D has analogue of C++ std::optional which is std.typecons.Nullable