apple / swift-protobuf

Plugin and runtime library for using protobuf with Swift
Apache License 2.0
4.58k stars 455 forks source link

Excessive size of generated Swift code #1204

Open BalestraPatrick opened 2 years ago

BalestraPatrick commented 2 years ago

Hello!

Many parts of our codebase use SwiftProtobuf. Recently we started tracking app size in a more accurate manner and we noticed a trend that is pretty worrying for us. Generated Swift protos code increase our app size a lot. Recently, we removed a single proto file that was about 400 LoC which contained about 70 message definitions (including the various transitive imports) and the generated code was about 5KLoC. 304KB of our app size was attributed to symbols coming from the generated Swift Protobuf code.

We are building with SWIFT_OPTIMIZATION_LEVEL = -Osize in release mode but I wonder if there are other ways to reduce the size of the generated Swift code.

I can't exactly share my full proto, but I was wondering if this is a known issue with Swift or maybe there are ways to reduce the impact of the generated code. Does anyone have experience with this particular issue?

tbkka commented 2 years ago

Code size is a common concern with code-generated approaches such as this. Protobuf implementations for some other languages rely heavily on reflection which makes them smaller but significantly slower.

If you're only using the binary encoding, it should be easy to strip out the field names and other content that's only there to support JSON and TextFormat encoding. Right now, I think this would require a small change to the code generator, but I've long been interested in emitting that content as separate .swift sources that contain only those extensions. It would then be easy to delete those files. (Alternately, we could consider splitting the JSON and TextFormat support into a separate generator.) You could also look critically at whether there are other parts of the generated code that you might omit: For example, the generated == implementations are somewhat bulky and may not be needed in your application.

thomasvl commented 2 years ago

fyi - #18 is open for tracking splitting out the textual support.

dflems commented 2 years ago

Wrote a little wrapper to patch the generated swift source code to remove conformance to SwiftProtobuf._ProtoNameProviding, which seems to have shaved off about 10% of the total binary size of the generated Swift protobuf in our app (according to the linkmap). Would be nice for this to be an option in the generator for sure!

I briefly looked into removing == as well but _MessageImplementationBase is Hashable so it needs an implementation of it or a change to the runtime.

update: Turns out we're using JSON encoding/decoding a little bit in the codebase and can't merge this, sadly

allevato commented 2 years ago

Another improvement I wanted to look at in this area to reduce the amount of code generation was to make serialization and other related functionality (hashing, equatability) table-driven. Unfortunately, the only way to get static arrays of constant data into a data segment is through a SIL transform that only runs on optimized builds, and even when that transform applies is very unpredictable. If it isn't applied, then we'd end up generating code that heap-allocates those arrays and populates them element-by-element, and that code would run the first time a particular message is serialized, parsed, equality-tested, or hashed, which would make client code performance unpredictable in ways that we should probably avoid*.

* To be fair, this is already happening with the name tables we generate for text/JSON serialization, but that's restricted to a much smaller set of serialization operations that are expected to be less efficient than binary format.

cprovatas commented 2 years ago

What if there was a option to opt-in to only one serialization mechanism? Say a client only needs binary encoding / decoding? Would that make any difference in the size of the generated code?

tbkka commented 2 years ago

The idea of having an opt-in is a good one, and it's something we've discussed on many occasions. It would certainly make some difference, though someone would have to actually try it and measure to figure out how much savings. But the detailed design is tricky:

At this point, I would say that we have lots of good ideas; we really need some folks to actually try implementing some of these ideas and see how well they work out.

thomasvl commented 2 years ago

1240 has a draft of some work I did to split the generated code into what is needed for the just binary, and then extra files needed for the textual formats.

Since a Visitor/Decoder pattern is used by the library, there isn't a lot of code specific to the formats. At the moment, the file numbers and binary encoding information is part of the base generated code, as that's a very small amount of data. The textual support then layers on the needed mapping between field numbers and the names. Since the JSON names can mainly be derived from the TextFormat names; most cases, it means we just need one string and a marker saying we can derive the other one. Splitting that in two completely different things could result in even larger code when folks need both since we'd potential be more verbose instead of allowing things to be derived.

One thing #1240 doesn't yet take on is splitting up the core runtime library so if you don't need the textual formats, you don't have to link that backing code. No effort as been done to see how much that might save/etc. Using that PR as a starting point would likely make some sense to start getting more clarity into what the potential savings would be.

acecilia commented 7 months ago

👋 Related with the size of the generated code, the size of the SwiftProtobuf SDK itself is also considerable: 1.4MB for latest version 1.25.2 (this is the size of the binary built statically inside a production app - measured using linkmap).

Adding this comment here with the size information just for context

Screenshot 2024-03-19 at 22 34 31