awakesecurity / proto3-suite

Haskell Protobuf Implementation
https://hackage.haskell.org/package/proto3-suite
Other
80 stars 55 forks source link

large-records support #225

Closed rkaippully closed 1 year ago

rkaippully commented 1 year ago

The large-records library is designed to avoid quadratic core code size for large records. This PR adds support to make use of this library in the code generated by compile-proto-file. The generated code has a different representation for the records corresponding to the messages and services. This should significantly improve the core code size and compilation timings.

See https://well-typed.com/blog/2022/03/large-records-without-the-quotes/ for an introduction to the large-records library.

The feature is behind a cabal flag that is disabled by default, and hence needs to be opted in.

While the large-records representation improves the code size and compilation time of the records themselves, proto3-suite generate a large number of type class instances too. I have seen moderate improvement in memory allocation and compilation timings (around 30% and 10% respectively) compiling the proto3-suite generated code, but it is not as dramatic as one would expect. There could be additional improvements if you have some code that constructs/pattern matches these records a lot. As always, you should measure the performance before deciding to turn on this feature.

Also, note that the Generic representation provided by large-generics is different from GHC Generics. Some advanced transformations possible with GHC generics is not easy or even possible with large-generics. For example, it is possible to walk the Rep a x tree of GHC generics and transform it to a Rep b x if b has a similar representation to a. But this is not possible with large-generics.

j6carey commented 1 year ago

Is there a way to do this without the cabal flag? Maybe a combination of command-line arguments for compile-proto-file and a separate package with the additional dependencies? The reason I ask is that there could be situations in which a package depends upon two other packages, which in turn both depend upon proto3-suite, but with different requirements for which cabal flags are enabled.

rkaippully commented 1 year ago

Is there a way to do this without the cabal flag? Maybe a combination of command-line arguments for compile-proto-file and a separate package with the additional dependencies? The reason I ask is that there could be situations in which a package depends upon two other packages, which in turn both depend upon proto3-suite, but with different requirements for which cabal flags are enabled.

I think it is possible to introduce a command line argument to compile-proto-file instead of using the cabal flag, as there is no large-records dependency.

I have mixed feelings about a separate package such as proto3-large-records. Firstly, that package will only contain orphan instances (which is probably OK for this case?). Secondly, the flag only controls a few instances for types defined in large-records. So if a package does not use large-records, these instances should not cause any issues. Finally, we have similar flags for Dhall, swagger etc. So, to be consistent, we should introduce similar packages for those as well. Shall we discuss this separate from this PR? I don't want to add too many unrelated changes to a single PR.

j6carey commented 1 year ago

So, to be consistent, we should introduce similar packages for those as well. Shall we discuss this separate from this PR? I don't want to add too many unrelated changes to a single PR.

I had considered that, but thus far I hadn't gotten very far with it. But if large-records does not introduce too much dependency upon other packages, then the issue can be postponed. So if I understand correctly, you would be OK with controlling the new behavior using a compile-proto-file command line option, but including the necessary instances in the proto3-suite library?

ixmatus commented 1 year ago

I'm in agreement with the idea to controll new behavior using a CLI option and including the necessary instances in the proto3-suite library.

rkaippully commented 1 year ago

I had considered that, but thus far I hadn't gotten very far with it. But if large-records does not introduce too much dependency upon other packages, then the issue can be postponed. So if I understand correctly, you would be OK with controlling the new behavior using a compile-proto-file command line option, but including the necessary instances in the proto3-suite library?

Correct. I pushed a change that adds a command line flag to compile-proto-file instead of the cabal flag. The proto3-suite library still uses the cabal flag to include the instances.

rkaippully commented 1 year ago

@j6carey & @ixmatus Do you think these changes are good to merge now?

j6carey commented 1 year ago

@rkaippully , My concerns were addressed, but I didn't do an exhaustive review, just looked at high-level issues. One thing to note: at some point we should do a performance test of the generated code when the new feature is enabled; however, that is not required before merging because this feature is optional.