awakesecurity / proto3-suite

Haskell Protobuf Implementation
https://hackage.haskell.org/package/proto3-suite
Other
81 stars 56 forks source link

Tutorial should provide more guidance around idiomatic enum use #235

Closed robinp closed 4 months ago

robinp commented 1 year ago

Edit: take this blob with grain of salt, see https://github.com/awakesecurity/proto3-suite/issues/235#issuecomment-1697864872 where the conclusion is that 0 = UNKNOWN or some other non-harmful default is still advisable.

In proto3 (and generally) it is recemmended to have an UNKNOWN enum value with value 0. The reason is, if the enum is missing or contains unknown value, the user will observe the default value 0 (big note: this is only true for closed enums, which proto3-suite doesn't implement it seems. See later below). If that is not UNKNOWN but some other specific value, the code interpreting the message might mistake the actually missing enum to an enum with the first possible value.

The tutorial in https://github.com/awakesecurity/proto3-suite/blob/1f2c156b1178599d3853dac941beb3f29e2bdf5e/src/Proto3/Suite/Tutorial.hs#L122 defines Circle = 0 which seems dangerous.

Sidenote (not sure why, but I wrote it): since proto3-suite's Enumerated a type seems to be a newtype for Either Int32 a, I assume the left value is an int that was unknown at parsing time. So, when I'm reading protobuf from Haskell code, I'm kind of forced to double-check the value - is it Right? Is it UNKNOWN? Well, that's just a small annoyance, and I think is good in spirit of proto intention (that is, as long as I don't care to inspect the field, I can reencode it and pass it along to other systems just fine, with the content unchanged, in the hope they can deal with the value if they need to).

So bottom line is, using non-UNKNOWN enum value as zero would kind of surely cause problem in other languages eventually.

Sadly, this forces us to add an extra not-really-valid constructor to the Haskell enum, making its nice pure use bit more painful (error checking creeps in).

I wonder if one could eat the cake and have it too - have some option for the codegen, that: 1) enum constructor with value = 0 is omitted from the Haskell-generated type, and 2) a value=0 is parsed into a Left int value (maybe that would happen after 1) automatically).

Wondering if there's any downside / semantic difference of that? Why don't other languages do that?

Actually take above with grain (or, hm, shovel) of salt: https://protobuf.dev/programming-guides/enum/ defines the madness of interpretation across proto2/3. Oh my. Based on that, it seems proto3's Enumerated type, when interpreted directly, implements the "open" enum. And for that usecase, what I said about needing to have UNKNOWN = 0 sounds not valid (the open enum avoids the default-returning problem of "closed" enums).

(Don't ask if the closed-enum case is well-supported by proto3-suite then... I also don't care much about that case since I'm pure-proto3, but others might do).

Recap:

So a bit of clarification of the documentation around this would be helpful. Also, could the compiler help and determine which is the case here, and emit warning / error if the recommended approach is violated?

Or, I'm completely missing maybe some helpers that already exist along these lines?

Slightly related #28 . Sorry for the braindump format.

riz0id commented 1 year ago

The left hand side of Enumerated is only used when decoding the wire format parser successfully decodes a varint for an enum field, but the decoded varint does not correspond to any known enum specified in the protobuf file. The wire parser for enums is here if you're interested.

Representing enums as Either Int32 a enables users to handle unrecognized enum values however they see fit. One way to handle Enumerated is as proto2 would and treat unrecognized Int32 values as some canonical unknown value. This can be done easily with Bounded:

toEnumClosed :: Bounded a => Enumerated a -> a
toEnumClosed x = case enumerated x of 
  Left  {} -> minBound
  Right e -> e 

Perhaps the compiler should automatically unpack Either Int32 a into just a and send all unknown values to the first enum if syntax = "proto2"; was stated at the top of the enclosing file. I've kinda had this compliant myself because handling Enumerated everywhere can become tedious. Implementing this would be a lot of work though, which is probably why many protobuf compilers for other major languages don't support proto3 and proto2 -specific compilation pipelines for enums.

I hope that helps? Maybe some clarity on what exactly you're asking for or think should be changed would be helpful for me.

robinp commented 1 year ago

Hello - thank you for the comments, and sorry for the confusion.

Indeed my post started out as reporting some issue, but as I dig in, I realized that the current behavior is fine and desirable from my point.

So in my two recap points I think I wanted to include something along which I miss from the documentation - pointer for the unaware user which proto enum practice to follow. But you can validly argue that that is generic protobuf-lore, and not specific to the proto3-suite implementation.

(Background: I came back to proto(3) after a bit of absence, and the old proto2 best practice was to always include UNKNOWN=0 for reasons written above. Was a bit surprising but also conforting to read that a more transparent practice is recommended for proto3).

So maybe a small note in the user docstrings / tutorial around enums that points to either the official proto3 docs, or this issue on whether one should use an UNKNOWN ctor in the enum or not, would be helpful.

robinp commented 1 year ago

Revisiting this thread:

In answer to your comment, I agree that not falling back to some default value is the saner option, even if leads to more case handling. It is too much of a lurking footgun.

Revising my own suggestion, it seems that even with proto3's open-enums approach, it is a beneficial practice to include UNKNOWN as the 0-value: the 0 value is the one that a missing enum will default to

For enums, the default value is the first defined enum value, which must be 0.

Actually this is more or less hinted by the proto3 guide, but they just say you should include a 0-value, and don't explicitly say that the 0-value should mean UNKNOWN or UNSPECIFIED. The examples they give hint it, but not very strongly.