fsharp / fslang-suggestions

The place to make suggestions, discuss and vote on F# language and core library features
345 stars 21 forks source link

The `UnknownEnum` pattern #822

Open Happypig375 opened 4 years ago

Happypig375 commented 4 years ago

I propose we add an UnknownEnum pattern that matches undefined enum values.

For example, consider:

type IsWaterFlowing = WaterFlowing = 0 | WaterNotFlowing = 1

match someWebAPI arguments with
| IsWaterFlowing.WaterFlowing -> "Water is flowing"
| IsWaterFlowing.WaterNotFlowing -> "Water is not flowing"
// FS0104: Enums may take values outside known cases. For example, the value 'enum<_> (0)' may indicate a case not covered by the pattern(s).

The existing way of approaching this problem in F# is adding a wildcard case, but we lose the ability to track new additions:

// New case added!
type IsWaterFlowing = WaterFlowing = 0 | WaterNotFlowing = 1 | WaterPartiallyFlowing = 2

match someWebAPI arguments with
| IsWaterFlowing.WaterFlowing -> "Water is flowing"
| IsWaterFlowing.WaterNotFlowing -> "Water is not flowing"
| _ -> "Unknown" // But no warning is reported here...

By having an UnknownEnum pattern, we can preserve the ability to track new cases while handling other undefined cases:

// New case added!
type IsWaterFlowing = WaterFlowing = 0 | WaterNotFlowing = 1 | WaterPartiallyFlowing = 2

match someWebAPI arguments with
| IsWaterFlowing.WaterFlowing -> "Water is flowing"
| IsWaterFlowing.WaterNotFlowing -> "Water is not flowing"
| UnknownEnum x -> sprintf "Unknown enum value: %d" x
// UnknownEnum deconstructs the underlying value to enable better logging
// FS0025: Incomplete pattern matches on this expression. For example, the value 'IsWaterFlowing.WaterPartiallyFlowing' may indicate a case not covered by the pattern(s).
match someWebAPI arguments with
| IsWaterFlowing.WaterFlowing -> "Water is flowing"
| IsWaterFlowing.WaterNotFlowing -> "Water is not flowing"
| IsWaterFlowing.WaterPartiallyFlowing -> "Water is partially flowing"
| UnknownEnum x -> sprintf "Unknown enum value: %d" x

Pros and Cons

The advantages of making this adjustment to F# are

  1. Much easier discovery of bugs by encouraging both matching all existing cases of an enum and matching unknown cases of enums.
  2. Development speed - everywhere the change enum is used, a warning is reported. Developers do not have to look through lots of lines of code to handle new cases.
  3. Less crashes due to match failures.

The disadvantage of making this adjustment to F# is the introduction of an hardcoded built-in pattern that influences flow analysis which needs to be learnt. However, the upcoming NotNull pattern also falls into this bucket, so this category of patterns is already being introduced into F#.

Extra information

Estimated cost (XS, S, M, L, XL, XXL): M

Related suggestions: No

Affidavit (please submit!)

Please tick this by placing a cross in the box:

Please tick all that apply:

charlesroddie commented 4 years ago

I proposed changing "FS0025 Incomplete pattern matches" to the current FS0104 (https://github.com/fsharp/fslang-suggestions/issues/652) to allow disabling FS0104. With FS0104 disabled, you get all the advantages you mentioned, with no need to add another language construct or an extra match case.

The disadvantage of that approach, as this thread shows, is discoverability, as few people know that you should disable FS0104 to make enums safe and usable. Perhaps it's time to do this properly and allow enums to be usable for the whole community, not just a few people in the know. So I suggest:

Disable FS0104: Enums may take values outside known cases

This will cause a little controversy, as people are likely to bring up the fact that you can cast out-of-range ints to enums (irrelevantly, because they would be invalid data and should fail in a match), or that enums are occasionally used not as enumerations but as a sequence of bits describing flags.

baronfel commented 4 years ago

@charlesroddie perhaps as protection against that controversy, when the old warning is disabled the typechecker can be expanded to warm the user at places where the arbitrary int is cast to the enum, instead at places where the enum is used.

matthid commented 4 years ago

warn the user at places where the arbitrary int is cast to the enum

I don't think that is realistic because I'd assume that the sources of this are reflection and serialization or mixing of versions

baronfel commented 4 years ago

That's fair, I was mainly thinking of usages of the enum operator when I wrote that.

Happypig375 commented 4 years ago

FS0104 protects against cases such as JsonConvert.DeserializeObject<{|IsWaterFlowing:IsWaterFlowing|} """{"IsWaterFlowing":3}""". The UnknownEnum pattern protects against this case while also protecting against new cases for the enum unlike the wildcard pattern.

abelbraaksma commented 4 years ago

because I'd assume that the sources of this are reflection and serialization or mixing of versions

And anything that's exposed. If you expose a method or function, nothing stops a consuming F# or C# project from passing in any value.

Just saying, at the moment you should always test for invalid values. And the way to do that is the catch-all pattern. It would be nice to have some way to have the best of both worlds (warning on missing cases, and treating out of range values specially), and this proposal goes a long way.

Still, there are some drawbacks and considerations, like Flags, and perhaps the observation that this may lead to beginners to use enums more than DU's. In fact, enums should be relatively rare in F#. As an aside, some people may not be aware that matching over enums is worst case O(n), where matching over DU's is constant O(1) (translates to table lookup).

Happypig375 commented 4 years ago

As an aside, some people may not be aware that matching over enums is worst case O(n), where matching over DU's is constant O(1) (translates to table lookup).

That's an implementation detail and can change over time.

abelbraaksma commented 4 years ago

That's an implementation detail and can change over time.

Yes, it's an implementation detail, no I don't think can change over time, it's inherent to how enums are defined. Though it would be nice (perhaps in trivial cases, translate the same way C# switch is translated). Regardless, it's the current operation.

Enums just shouldn't appear where a DU is more appropriate. But having a good way to deal with unmatched values would be very nice to have.

charlesroddie commented 4 years ago

If you expose a method or function, nothing stops a consuming F# or C# project from passing in any value. Just saying, at the moment you should always test for invalid values... And the way to do that is the catch-all pattern.

For any type A, consuming code can pass in Unchecked.defaultof<A>, so if you apply this suspiciousness consistently you should check for invalid data all the time and 1/2 of F# code would consist of checks for invalid data.

In addition, if you do this check you need to specify what to do after encountering invalid data.

You could fail, but then the result is the same as not doing the check, which will also fail when encountering invalid data.

Or you could return a Result type, and then we get a situation where 1/3 of F# code is dealing with unnecessary checks for invalid data, 1/3 of it is dealing with the generated Result types, and the remaining 1/3 is useful.

In fact, enums should be relatively rare in F#.

You are proposing a method of using enums that makes them very difficult, and a solution would be to avoid using them. But enums are the simplest object describing an enumeration and so should be used in preference to DUs when there is no data involved. Large advantages are that they serialize easily by casting to and from int, and they interoperate easily with other .Net code.

Happypig375 commented 4 years ago

From the above discussion, it can be seen that two types of enum usage are present:

At boundaries Everywhere
Description Enums are only used at boundaries, e.g. serialization for web requests and public-facing APIs. DUs are used in application logic instead as they provide better type safety. Enums are also used heavily inside application logic because they are lightweight.
Should disable FS0104? No, as crashing the whole application due to external invalid data is undesirable and can be avoided. Yes, as the amount of invalid data checking will be massive. Exceptions will be more desirable.
Will UnknownEnum help? Yes, as enums are only used at boundaries which make up a minority of all code. No, as enums make up the majority of the code.
Effect of UnknownEnum on code Positive effect of the ability to track new enum cases while handling other undefined enum cases. No effect as it won't be used.

The addition of UnknownEnum has an overall positive effect and so it should be added.

abelbraaksma commented 4 years ago

so if you apply this suspiciousness consistently you should check for invalid data all the time and 1/2 of F# code would consist of checks for invalid data

@charlesroddie Why? Only on the boundaries of your interfaces. And you have to, see also the F# source. But within your code you have guarantees. But not on enums.

I was merely giving an (other) example of the many that already exist: enums are not safe. That's a fact of life and we can make life easier, but they will never hold the guarantees a DU has, and shouldn't be used willy nilly.

You are proposing a method of using enums that makes them very difficult, and a solution would be to avoid using them.

I'm sorry if I created confusion, but I was proposing nothing. If anything, I'm in favor of this proposal, and think working with enums should be made simpler.

But we should not lose sight of the fact that enums are not a good choice when you want to match over it (we can improve, sure, but it's not native Ave it currently feels clumsy, costly etc), and your comment suggests the same, I think.

For interop, sure. And for that you must range check, just like you need to null check external data.

Or you could return a Result type, and then we get a situation where 1/3 of F# code is dealing with unnecessary checks for invalid data, 1/3 of it is dealing with the generated Result types, and the remaining 1/3 is useful.

I don't understand this and I don't see the relation to this subject. If anything, using Result or its cousin Option tends to be ubiquitous in F#, and if anything, it makes code simpler. You have your combiners and your don't have to think of anything anymore, and can focus on the things that matter. If you don't, something's not right....

charlesroddie commented 4 years ago

@charlesroddie Why? Only on the boundaries of your interfaces. And you have to, see also the F# source. But within your code you have guarantees. But not on enums.

What I said was that no type has a guarantee because you can use Unchecked.defaultOf<A> to create an invalid object, just as you can cast a bad int to an enum to create an invalid enum.

The only slight difference is that casting is done more often in practice. So along the lines that @baronfel suggested it would be good to have a safe int->enum creator which generates code at compile-time which fails if the int is not a valid case.

But we should not lose sight of the fact that enums are not a good choice when you want to match over it

This is the thing we are debating. Do you have an argument for this?

abelbraaksma commented 4 years ago

So along the lines that @baronfel suggested it would be good to have a safe enum creator which generates code at compile-time which fails if the int is not a valid case.

I think then, that we're in agreement. Though I thought this proposal was for a runtime catch (and compile time closing over the set of available defined enum fields), not a pure compile time check. If that's possible, it would be great, but perhaps we need something like 'guarded enums'.

And yes, it's certainly possible to screw up by using unsafe constructs ;)

charlesroddie commented 4 years ago

By compile-time I meant generating efficient code to check at runtime. To clarify with a concrete analogy:

type A = | A0 | A1 // (strangely you need the A1 case for the match to fail)
let a = Unchecked.defaultof<A>
match a with // Runtime error: Object reference not set to an instance of an object
| A0 -> ()
| A1 -> ()

Where is the bug? It's the definition of a that is at fault here, not the match code.

type B = | B0 = 0
let getSerializedEnumUnreliably() = 1
let b = enum<B>(getSerializedEnumUnreliably())
match b with // Runtime error: The match cases were incomplete
| B.B0 -> ()

Where is the bug? Certainly not in the match code. In getSerializedEnumUnreliably(), but perhaps also the definition of b was too trusting. We could create a validatedEnum<B>:int->B function which fails on runtime: better to fail asap.

abelbraaksma commented 4 years ago

Where is the bug? It's the definition of a that is at fault here, not the match code.

Agreed, but using a DU with Unchecked.defaultOf is really, really bad practice. I think it would be a good idea to have a warning of some sort here, though arguably, only experienced programmers ever use that construct, and when they do, they probably know afar they're doing (i.e., I've used it as the return value of generic functions that never return).

Where is the bug? Certainly not in the match code. In getSerializedEnumUnreliably(), but perhaps also the definition of b was too trusting

I understand you use a contrived example to show what could happen, but the thing is, this can always happen with enums (hence the requirement to have an explicit 'catch the rest'), but never with DU (you'll hit an NRE before you can match, as you showed).

Which brings us back to the suggestion of the OP, to make this explicit. Instead of 'catch the rest', it would be 'compile check all legal values, then catch the illegal ones'.

realparadyne commented 4 years ago

This feature would be so useful! I currently maintain code which has interop with C#, involves serialisation and a third party API with lots of enums that they add to and at times may send me values I don't yet know about or don't have the latest API dll for etc. etc. I don't like having to use the wildcard pattern match and then losing the warnings when new enum members get added, which is one of the great advantages of pattern matching in the first place.

It looks a very elegant to use solution, particularly the decomposing of the unknown value to immediately use in logging / throwing as needed.

NinoFloris commented 4 years ago

Why not extend this to DUs?

Swift handles this nicely https://github.com/apple/swift-evolution/blob/master/proposals/0192-non-exhaustive-enums.md or the TLDR version https://medium.com/swiftify/enum-in-swift-5-19b324b25f94

And rust has the #[non_exhaustive] attribute to signal to the compiler exhaustive matches outside the defining assembly should be warned on https://github.com/rust-lang/rfcs/blob/master/text/2008-non-exhaustive.md

Having the two features together makes a lot of sense. Most times it's better to have an unknown arm for non exhaustive dus/enums than a _ catch-all arm. Pushing users in that direction by default would mean that they'll still be notified when new cases are added, which they may want to react to specifically.

Adding unknown to the language is something I'm supportive of.

Happypig375 commented 4 years ago

Why not extend this to DUs?

There is no such concept of an "undefined numeric value" for DUs while enums have it.

NinoFloris commented 4 years ago

I would suggest you read up on the feature for swift and rust first. New cases are practically the undefined numeric value you talk about (DUs always have an int Tag) and today adding cases breaks all downstream users.

Happypig375 commented 4 years ago

today adding cases breaks all downstream users.

There. Today, adding a case is already considered a breaking change. This has been embedded in the F# culture for a while and is not easy to change.

Enums are there when you want to have "extensible DUs".

NinoFloris commented 4 years ago

I hadn't expected you to use 'status quo' as an argument, considering you have 27 language suggestions open, why add anything to the language?

Happypig375 commented 4 years ago

Do any of my suggestions blatantly invalidate

This is not a breaking change to the F# language design

?

NinoFloris commented 4 years ago

Which part in my suggestion is breaking?

To have the unknown pattern support not just enums but also DUs isn't breaking, you have to opt in by changing your match. My other part was that additionally in a later RFC we could add an attribute to FSharp.Core like #[non_exhaustive] that can be used on DUs to notify/warn devs to use unknown. That to my knowledge also isn't breaking.

Happypig375 commented 4 years ago

Are you proposing another type of DUs instead of changing the current DUs? If so, forget what I have said. Sorry for misinterpretation.

abelbraaksma commented 4 years ago

@NinoFloris, I also don't understand the DU suggestion. Currently, if you add a case, any usages in matches will already issue a warning, which proves extremely valuable in refactoring.

What situation are you referring to where this doesn't happen? I assume you mean replacing _ wildcard matches, but let's say we replaced this with UnknownCase, what would that mean? As during compiling, there's no such thing as an unknown DU case, even if you add a case, where there's a previous wildcard match, how would the compiler distinguish between earlier 'everything else' and the newer 'everything else'? (in contrast, with classic enums, the majority of values is 'unknown', and values can overlap, be flags etc, it's a very different thing, and here the OP assumes all known valid cases are matched against, leading to the UnknownEnum raising a warning of there's still known values).

abelbraaksma commented 4 years ago

Thinking a second longer about this idea, perhaps the inverse is easier. Suppose we allow a new attribute on a DU case, say RequiresExplicitMatch. You add that to tell the compiler it is not allowed to be caught in a 'catch all' type of match and must be explicitly case-matched.

Then you'd recompile, compiler issues warnings where you don't explicitly match over the newly added case. You check and/or fix these usages, and once done, you remove the attribute again.

(just an idea, not necessarily a good one ;)

Happypig375 commented 4 years ago

@dsyme Would this idea be an approved-in-principle?

Happypig375 commented 4 years ago

RFC proposed: https://github.com/fsharp/fslang-design/pull/442

dsyme commented 4 years ago

Yes, I will mark the suggestion approved in principle.

As part of the implementation please make sure that the warning message for incomplete matches on an enum value explcitly guided the user to add an UnknownEnum case. Otherwise no one will ever know to use this.