andersfugmann / ocaml-protoc-plugin

Plugin for Google's protobuf compiler to generate interfaces based on protobuf specifications and runtime for encoding and decoding protobuf messages
https://andersfugmann.github.io/ocaml-protoc-plugin/
Apache License 2.0
18 stars 1 forks source link

Unique OCaml module for all messages #42

Closed lambdaxdotx closed 2 days ago

lambdaxdotx commented 1 month ago

Hi, Thanks for your work on this project.

This is a feature request more than an actual issue.

tldr: Would it be possible to modify the OCaml generation to optionally emit a single OCaml module for all messages in a .proto file rather than a separate module for each of the messages?

A more reasoned request follows. Messages can be structured to follow a hierarchy, as in this example:

message A {
  bool is_something = 1;
  repeated B bs = 2;
}

message B {
  bool is_something_else = 1;
  oneof content {
    C c = 2;
    D d = 3;
  }
}

message C { ... }
message D { ... }

Especially in such cases, it would be useful to optionally have in output just an OCaml module with a type mirroring such a hierarchy, approximately of the form:

type a = {
  is_something: bool;
  bs: b list;
}

and b = {
  is_something_else: bool;
  content: [ `not_set | `C of c | `D of d ];
} 

and c = { ... }
and d = {....}

Such an OCaml type hierarchy allows for easier navigation of the type structure than the current one. In particular, it would be easier to write a ppx for doing something on such a solution than the current one.

Does this sound reasonable? How hard would it be to make such a change?

andersfugmann commented 1 month ago

I think its impossible. In terms of effort its huge even to try to accomplish this.

There a too many corner-cases that would make the type names less predictable and in some cases almost impossible to construct types without adding type annotation on almost every function. The aim of ocaml-protoc-plugin is to provide a fully compliant implementation. Also a design goal is to have as predicable name mapping as possible and not have the names change when making additions to the proto files (i.e. adding a new message type in extend to existing ones should not incur changes to existing name mappings).

Have you looked at the types generated by ocaml-protoc? It does not use modules for generated types and may be closer in terms of type generation to what you are looking for.

In particular, it would be easier to write a ppx for doing something on such a solution than the current one.

Could you provide an example of a ppx that would be eaiser to write?

Namespace pollution

Enclosing each message definition in a module allows helper functions to be nicely scoped as well as solving resolution problems.

Imagine the following definition

message A {
  bool x;
  bool y;
}

message B {
  int32 x;
  int32 y;
}

would then translate to

type a = { bool x; bool y; } 
and  b = { int x; int y; }

But it would be difficult to construct either as the field names occupy the same namespace, and to need explicit type annotations on each use to distinguish between type a or b.

Sub-messages

Sub messages would be difficult to implement. Consider the following

message A {
  message B { 
    bool a; 
  }
  B b;
}
message B {
  message B {
     int32 a;
  }
  B b;
}

This would not allow for a trivial name mapping - any the type hierarchy would need to be encoded in the name, which I feel is less intuitive, e.g.

type a = ...
and type a_b = ...
and type b = ...
and type b_b = ...

Which would even break when you name name conflicts (i.e. a message called B_b).

There are so many cornercases that would need to be handled, but I think it to boils down to how to create unique names in a shared namespace that are predicable and usable - and I don't see any viable path to solve this that does not place restrictions on what constructs / names can be handled.

lambdaxdotx commented 1 month ago

Thanks for your thorough reply.

I do not think the task is impossible, but sure enough consequent in terms of effort.

I share your opinion: it boils down to having a naming scheme/protocol of unique names. Again, quite an effort, but not impossible imho.

Let me reply to your questions:

Have you looked at the types generated by ocaml-protoc? It does not use modules for generated types and may be closer in terms of type generation to what you are looking for.

I had a look at it. It may be fine from the OCaml types point of view, but I do not like the architecture/idea of the project. I like more this project's idea of a plugin for protoc. Plus, it does not support things that I quite need.

Could you provide an example of a ppx that would be eaiser to write?

I have not such an example. Sure enough though, a ppx that just needs to deal with navigating simple types is simpler than one having to also deal with modules.

Knowing that

The aim of ocaml-protoc-plugin is to provide a fully compliant implementation. Also a design goal is to have as predicable name mapping as possible and not have the names change when making additions to the proto files

let me rephrase my request: Would you be willing to accept a merge request for such a change? Would you be available to discuss and guide during the development of such a change?

Again, it would be an optional behavior, not the default one. Also, I am not telling that I have such a MR ready, nor that I have time to work on it at the moment.

andersfugmann commented 1 month ago

Its of course possible to create a flat type mapping. What I meant was that its (almost) impossible given the current architecture of the plugin - as well as very very large effort - and I think a lot of compromises will need to be made, and I'm unsure if will be worth while.

let me rephrase my request: Would you be willing to accept a merge request for such a change? Would you be available to discuss and guide during the development of such a change?

I'd be happy to discuss how this can be done and offer guidance, but I cannot guarantee that I will merge a PR that may significantly increase maintenance of the code.

Could you provide an example of a ppx that would be eaiser to write?

I have not such an example. Sure enough though, a ppx that just needs to deal with navigating simple types is simpler than one having to also deal with modules.

My question about PPX'es was to understand the motivation better. I don't think PPX'es are any more difficult to write just because types are placed in modules. But my question remains: Whats the motivation for this? The design decision to create a module for every message type was indeed to simplify and be more Ocaml-idiomatic. Also I have been working with ocaml-protoc and did not really like the flat types (and also needed a fully compliant implementation).

Maybe you could try exemplifying what the signatures would look like - i.e. how would the "flat" signature look like for

syntax = "proto3";
import "google/protobuf/timestamp.proto";

package echo;
message Request {
  google.protobuf.Timestamp timestamp = 1;
  string who = 2;
}

message Reply {  
  string response = 1;
}

service Echo {
  rpc Call (Request) returns (Reply);
}

Including various serialization and de-serialization functions as well as types for the rpc endpoint.

lambdaxdotx commented 1 month ago

Hi. Thanks again for your reply.

But my question remains: Whats the motivation for this?

Essentially, think about a typical Abstract Syntax Tree (AST) for a programming language, where each construct with its metadata is encoded as a protocol buffer message. Hence, at the end, you end up with a hierarchy of messages. Typically, the latter admits a simple, flat, OCaml type description.

I'd be happy to discuss how this can be done and offer guidance, but I cannot guarantee that I will merge a PR that may significantly increase maintenance of the code.

I understand. Totally fair.

At this point, you may also close this issue, I guess.

andersfugmann commented 1 month ago

But my question remains: Whats the motivation for this?

Essentially, think about a typical Abstract Syntax Tree (AST) for a programming language, where each construct with its metadata is encoded as a protocol buffer message. Hence, at the end, you end up with a hierarchy of messages. Typically, the latter admits a simple, flat, OCaml type description.

I'm still not sure what problem a simple recursive type definition is solving compared to a module hierarchy. Could provide a motivating example to show what will become easier? I'm genuinely interested in understanding.

andersfugmann commented 2 days ago

Closing this due to inactivity, but feel free to reopen or followup on the issue.