Non optional objects should be created with default values

bufbuild / protobuf-es

Protocol Buffers for ECMAScript. The only JavaScript Protobuf library that is fully-compliant with Protobuf conformance tests.

Apache License 2.0

1.15k stars 68 forks source link

Non optional objects should be created with default values #532

Closed GauBen closed 1 year ago

GauBen commented 1 year ago

I noticed an inconsistency regarding primitive and object optionality. I'm not sure to understand the proto3 spec correctly, so please close this issue if everything is working as intended.

Given this proto file:

message Bar {
  string prop = 1;
}

message Foo {
  string str = 1;
  optional string opt_str = 2;
  Bar bar = 3;
  optional Bar opt_bar = 4;
}

The generated typescript file is:

export class Bar extends Message<Bar> {
  prop = "";
}

export class Foo extends Message<Foo> {
  str = "";
  optStr?: string;
  bar?: Bar; // This seems wrong
  optBar?: Bar;
}

Shouldn't the generated Foo class be something like this:

export class Foo extends Message<Foo> {
  str = "";
  optStr?: string;
  bar = new Bar(); // Create Bar with default values
  optBar?: Bar;
}

Creating objects directly in field initializers is ok per MDN

The field initializer expression is evaluated each time a new instance is created. (Because the this value is different for each instance, the initializer expression can access instance-specific properties.)

GauBen commented 1 year ago

From what I see here, optional is set to true regardless of field.optional

https://github.com/bufbuild/protobuf-es/blob/d3f19c10075fb0dc6015e98ca4079a8d45c3367e/packages/protoplugin/src/ecmascript/gencommon.ts#L178-L187

jcready commented 1 year ago

This is intentional. The default field value for a message field is undefined.

GauBen commented 1 year ago

Thanks for your answer, what seems to lack from the generated TS code is a way to get default values from undefined objects

https://protobuf.dev/reference/go/go-generated/#singular-message

var m *Concert // defaults to nil
log.Infof("GetFoundingYear() = %d (no panic!)", m.GetHeadliner().GetFoundingYear())

Is there a way to have a similar feature in TS/JS?

const foo = new Foo();
console.log(foo.bar.prop);      // Crashes at runtime
console.log(foo.getBar().prop); // Currently missing API

timostamm commented 1 year ago

Gautier, in proto3, message fields are always optional. There is actually no difference between those two fields:

  Bar bar = 3;
  optional Bar opt_bar = 4;

In Go, foo.GetBar() returns nil, but it is a typed nil and it can have methods. ECMAScript has null, but it cannot have methods or other properties.

Different languages have different idioms. ECMAScript has optional chaining, and accessing the field would look like the following:

foo.bar?.prop; // string | undefined

If necessary, you can add in nullish coalescing. This behavior is pretty much identical to the C# implementation. It is a deliberate design choice and working as intended.

nullswan commented 1 year ago

Hey @timostamm, @jcready

You make a fair point - even though the Golang field is nil, it does not returns associated object methods, as you mentioned. The Python implementation demonstrates the behavior @GauBen proposed.

But yours does not seem to follow idiomatic practices, as every field is being set to null while tightly coupling the data access layer with the serialization layer. Read more about field presence. Actually, the problem we face is no way to know if a field has been explicitly marked as optional. Here is some additional examples in other languages:

Python betterproto using Optional

@dataclass(eq=False, repr=False)
class Bar(betterproto.Message):
    prop: str = betterproto.string_field(1)

@dataclass(eq=False, repr=False)
class Foo(betterproto.Message):
    str: builtins.str = betterproto.string_field(1)
    opt_str: Optional[builtins.str] = betterproto.string_field(2, optional=True, group="_opt_str")
    bar: "Bar" = betterproto.message_field(3)
    opt_bar: Optional["Bar"] = betterproto.message_field(4, optional=True, group="_opt_bar")

a = Foo()
print(a.bar.prop, len(a.bar.prop), type(a.bar.prop))
>>> '' 0 <class 'str'>

Golang using oneof and pointers

type Foo struct {
    state         protoimpl.MessageState
    sizeCache     protoimpl.SizeCache
    unknownFields protoimpl.UnknownFields

    Str    string  `protobuf:"bytes,1,opt,name=str,proto3" json:"str"`
    OptStr *string `protobuf:"bytes,2,opt,name=opt_str,json=optStr,proto3,oneof" json:"opt_str"`
    Bar    *Bar    `protobuf:"bytes,3,opt,name=bar,proto3" json:"bar"`
    OptBar *Bar    `protobuf:"bytes,4,opt,name=opt_bar,json=optBar,proto3,oneof" json:"opt_bar"`
}

func main() {
    a := v1.Foo{}
    r := a.GetBar().GetProp()
    fmt.Printf("%+v %T %d\n", r, r, len(r))
}
>>> '' string 0

To provide more context, we are seriously considering switching our TS implementation over your library but this is an adoption friction. What would you recommend to access such object in way that do not requires to verify that "non-explictly marked as optional objects" are present? Maybe an initial runtime validation ? Do you have any other "best-practices" in mind we should take care of when dealing with TypeScript and your library in a large codebase?

jcready commented 1 year ago

According to your link singular message fields use explicit presence regardless of the optional modifier being present or not.

What @GauBen proposed would lead to runtime errors when two messages referenced each other:

message Bar {
  Foo foo = 1;
}

message Foo {
  Bar bar = 1;
}

The generated code would then look like this according to the proposal:

export class Bar extends Message<Bar> {
  foo = new Foo();
}

export class Foo extends Message<Foo> {
  bar = new Bar();
}

Attempting to instantiate either of these classes leads to a RangeError: Maximum call stack size exceeded.

Actually, the problem we face is no way to know if a field has been explicitly marked as optional.

How would you use this information?

GauBen commented 1 year ago

Unnecessary intro

Protobuf is weird, this non-optional mutually dependent structure wouldn't work in most languages ```ts // No way to instantiate a JS object matching any of those interface interface Foo { bar: Bar } interface Bar { foo: Foo } ``` `const a = {}; a.foo = {bar: a};` but the object wasn't created as a valid `Bar` You may do it in some functional languages like OCaml, but it relies on recursive declaration which is not something really common: ```ocaml type foo = { bar: bar } and bar = { foo: foo };; let rec a = {bar = { foo = a}};; ``` ---

Because the naive instantiation doesn't make the cut, I'd love a lazy class instantiation on access:

message Bar {
  string prop = 1;
}

message Foo {
  Bar bar = 3;
  optional Bar opt_bar = 4;
}

Would be compiled to:

class Bar {
  prop = '';
}

class Foo {
  #bar?: Bar;
  get bar(): Bar {
    return this.#bar ?? new Bar(); // Lazy instantiation à la Python
  }
  set bar(value: Bar | undefined) {
    this.#bar = value;
  }
  // Might need an additional method to distinguish #bar = undefined from being a default value object
  // the same way foo.Bar == nil in go

  optBar?: Bar;
}

Our use-case is the following: we send messages containing reports, one being the current and one being the previous

message Notification {
  message Values {
    uint32 foo = 1;
    uint32 bar = 2;
  }

  message Report {
    Values values = 1;
  }

  string id = 1;
  Report current = 2;           // The current report always exists
  optional Report previous = 3; // The previous report does not exist on first notification
}

With zod we'd do it like that:

const Report = z.object({
  values: z.object({ foo: z.number(), bar: z.number() })
});
const Notification = z.object({
  id: z.string(),
  current: Report,
  previous: z.optional(Report)
});

// z.infer<typeof Notification> yields the following type:
interface Notification {
  id: string;
  current: { values: { foo: number; bar: number } };
  previous?: { values: { foo: number; bar: number } };
}

Then we would send a notification as follows:

function sendNotification(n: Notification) {
  console.log('Current foo:', n.current.values.foo);
  if (n.previous)
    console.log('Evolution:', n.current.values.foo - n.previous.values.foo);
}

With protobuf-es's current implementation, n.previous!.values is potentially undefined, making reading nested objects cumbersome; we don't know how to do it properly.

Anyway, thank you for taking the time to reply, we really appreciate it.

jcready commented 1 year ago

Full disclosure: I'm just a user of protobuf-es and do not speak for the maintainers.

With protobuf-es's current implementation, n.previous!.values is potentially undefined, making reading nested objects cumbersome; we don't know how to do it properly.

In case you are interested this is how I would do it (assuming you always want the first console log to output a number):

function sendNotification(n: Notification) {
  const currentFoo = n.current?.values?.foo ?? 0;
  console.log('Current foo:', currentFoo);
  if (n.previous?.values)
    console.log('Evolution:', currentFoo - n.previous.values.foo);
}

Regarding your new lazy initialization proposal:

class Bar {
  prop = '';
}

class Foo {
  #bar?: Bar;
  get bar(): Bar {
    return this.#bar ?? new Bar(); // Lazy instantiation à la Python
  }
  set bar(value: Bar | undefined) {
    this.#bar = value;
  }
  // Might need an additional method to distinguish #bar = undefined from being a default value object
  // the same way foo.Bar == nil in go

  optBar?: Bar;
}

I think you can appreciate how much this complicates the generated code, not to mention that private fields are only available in es2022 and I believe protobuf-es targets es2017 (there are workarounds like using WeakMaps, but that complicates the generated code even more).

Might need an additional method to distinguish #bar = undefined from being a default value object

Yes, this is where the "hazzers" would need to be introduced. The generated code would need to add something like:

get hasBar(): boolean {
  return this.#bar !== undefined
}

I'm not sure the above is even sufficient, right? Because with my naive hasBar getter this can happen:

const foo = new Foo();
assert(foo.hasBar === false);
const prop = foo.bar.prop;
assert(foo.hasBar === false); // Oops, throws

It seems like betterproto for python handles this by having the message class hold onto a private _serialized_on_wire attribute and having setters for every single field which will set that to true.

This proposal would complicate things like the toPlainMessage() method as, like you said, it is impossible to satisfy a non-nullable self-referencing interface using plain JS objects (unless we're counting getters/setters + WeakMaps as plain).

GauBen commented 1 year ago

In case you are interested this is how I would do it (assuming you always want the first console log to output a number):

We started doing this with assert for non optional messages, and conditions for optional ones.

function sendNotification(n: Notification) {
  assert(n.current?.values);
  console.log('Current foo:', n.current.values.foo);
  if (n.previous?.values) {
    console.log('Evolution:', n.current.values.foo - n.previous.values.foo);
  }
}

It's less elegant than what we used to have, especially when several fields are to be tested instead of one (e.g. n.previous -> n.previous?.foo && n.previous?.bar)

there are workarounds like using WeakMaps, but that complicates the generated code even more

As we generate TypeScript code, we can produce private fields instead, which get compiled to usual fields but raise compile errors

Your naive hasBar appears to work: ts playground

Didn't know about toPlainMessage, which definitely buries this issue.

I still don't get why the optional keyword is useless in protobuf, but that's a hill I'm not dying on.

Thanks again @timostamm and @jcready for answering my newbie questions

timostamm commented 1 year ago

Great discussion! ❤️

The use of plain properties was a deliberate design choice. The alternative boils down to getters and hassers that protobuf-javascript use, and plain properties are one of the most requested features for it.

In our code base, we have been using assertions as well for message fields that we expect to always be present. In C#, it's the same limitation, but it's otherwise rather pleasant to use.

Meemaw commented 9 months ago

Hey all. Skimmed over this thread, and I'm wondering if there is a world where message type generation could be tuned to return non optional fields. Doing assets/checks all over the place is imo not a scalable way to write software, and in cases where type is known to be always present just adds unnecessary complexity.

timostamm commented 4 months ago

I'm wondering if there is a world where message type generation could be tuned to return non optional [message] fields.

There are two main approaches: 1) Getters could return a new empty instance on access for unset message fields. 2) Immutable messages can share a single instance for unset message fields.

Unfortunately, getters and setters have poor support in many very popular frameworks, and returning new objects on every access will be very painful to handle with change-detection in React and others.

Immutable messages can have other benefits, and it's an intriguing approach, but immutability can also require a large amount of boilerplate for simple mutating use cases. It's also very often the case that 3rd party functions accept an array, not an immutable array, and you end up with more boilerplate.