bufbuild / protobuf-es

Protocol Buffers for ECMAScript. The only JavaScript Protobuf library that is fully-compliant with Protobuf conformance tests.
Apache License 2.0
1.06k stars 60 forks source link

Performance is not ideal #333

Open dae opened 1 year ago

dae commented 1 year ago

We currently use protobufjs, and would love to switch over to protobuf-es for the typescript ergonomics and es6/tree shaking that it offers. Parts of our app deal with large messages, so I did a quick performance test, and unfortunately protobuf-es is rather behind at the moment. The test I used was decoding a 6MB binary message, it containing repeated fields with about 150k entries. Protobufjs parses it in about 100ms; protobuf-es takes about 350ms. For reference, the same message as JSON is about 27MB, and JSON.parse() parses it in about 100ms as well. If providing the message/proto would be helpful, please let me know.

NfNitLoop commented 1 year ago

Hi @dae -- I'm not developer for this project, but I'm a user and just saw your issue. Providing a concrete use case is always helpful in bug reports so I just thought I'd chime in and say yeah, that's useful. :) Especially if you can also provide minimal code to reproduce the performance issue. Then it becomes a de facto test case so any fixes can be tested with before/after code.

felicio commented 1 year ago

Adding Protobuf-ES to protons's benchmarking suite at https://github.com/ipfs/protons/pull/89, where it also performs the slowest.

dae commented 1 year ago

I suspect any large message will reproduce the issue, but can produce a sample one if the devs request it.

On a positive note, protobuf-es's code generation seems to be a lot faster than what we currently have with protobufjs, where we have to produce a static version, which is then used to generate the ts, and then a separate json version needs to be produced because the static version's resulting code is huge. protobuf-es appears to be clearly sub-second, when protobufjs was taking around 5 seconds to rebuild each time a proto file was changed.

felicio commented 1 year ago

Reference how protons dealt with perf issues at https://github.com/ipfs/protons/issues/51.

jcready commented 1 year ago

Protons appears to use protobuf.js's UTF-8 decoder/encoder which has correctness bugs and likely the reason protobuf-es uses the native TextEncoder/Decoder. If you're using the NodeJS runtime for your benchmarks you can comment on this bug to ask for a faster TextEncoder/Decoder: https://github.com/nodejs/node/issues/39879 as Bun and Deno appear to have much faster implementations. Chrome, Firefox, and Safari are also generally much faster using the native TextEncoder/Decoder vs. protobuf.js's implementation (especially once you get above 32 characters) https://github.com/timostamm/protobuf-ts/issues/184#issuecomment-1000420219

smaye81 commented 1 year ago

Just to add some context also - It was a design choice to focus on usability, developer experience, and conformance when creating Protobuf-ES.

Consequently, Protobuf-ES is nearly 100% compliant with the Protobuf conformance test suite. To further illustrate this, we've created this Protobuf Conformance repo which shows how other implementations fare against this test suite. You can see that other implementations might be super performant, but the conformance scores are not great.

Nevertheless, we have some ideas for improving performance that are on our roadmap for the future.

timostamm commented 12 months ago

Just a small update: Thanks to @kbongort, binary read performance received a ~2.5x bump in v1.2.1. See https://github.com/bufbuild/protobuf-es/pull/459 for details.

dimo414 commented 10 months ago

Nevertheless, we have some ideas for improving performance that are on our roadmap for the future.

Any chance you could share some more context on the improvements you're considering, or any sense of a timeline on those improvements landing? We haven't done a rigorous benchmark but our initial tests have shown observable performance regressions if we migrate from protobuf.js. It would be helpful to know what's in progress (or done already, as @timostamm called out).

smaye81 commented 10 months ago

Hi @dimo41. We don't have any timeline to report but this is something we plan to address relatively soon. Some additional things we want to explore are first, investigating whether the changes made in #459 could also be applied to binary write performance as well as JSON read/write performance.

In addition, we'd like to investigate potentially supporting the optimize_for proto option for optimizing for speed. The downside to that is that it will increase code/bundle size at the expense of performance and potentially make testing a bit more difficult, so we want to think through how best to implement it.

malcolmstill commented 8 months ago

I was going to open a new issue for this but since writing / encoding performance is mentioned here latterly, I'll add to this discussion.

We have an issue where we are encoding a large number of f64s (doubles). In a particular example we are overall encoding around 4 million floats across a number of protobuf messages (~100 messages so each one contains ~40,000 doubles) and that is taking, in this case, 1600 milliseconds overall when running .enc (in particular protoDelimited.enc if it makes a difference).

The proto definition is more or less (there are some other fields but the overwhelming amount of data in each message is the data field):

message MyMessage {
  repeated double data = 1;
}

Profiling protoDelimited.enc for this series of messages shows that all the time is spent in double, with double being invoked separately for each f64 value (and allocating an 8-byte array each time).

In our case I'm able to show a ~30x encoding performance improvement (encoding takes ~50 milliseconds instead of ~1600 milliseconds) by extending the class BinaryWriter (packages/protobuf/src/binary-encoding.ts) with, say, an arrayDouble method:

arrayDouble(values: number[]): IBinaryWriter {
  let chunk = new Uint8Array(8 * values.length);
  const view = new DataView(chunk.buffer);
  for (const [i, value] of values.entries()) {
    view.setFloat64(8*i, value, true);
  }
  return this.raw(chunk);
}

Then in writePacked (packages/protobuf/src/private/binary-format-common.ts), and since I've only implemented this for double, special-case when the scalarTypeInfo returns method double to invoke a single arrayDouble call instead of n double calls:

export function writePacked(
  writer: IBinaryWriter,
  type: ScalarType,
  fieldNo: number,
  value: any[]
): void {
  if (!value.length) {
    return;
  }
  writer.tag(fieldNo, WireType.LengthDelimited).fork();
  let [, method] = scalarTypeInfo(type);

  if (method === "double") {
    writer["arrayDouble"](value as number[]);
  } else {
    for (let i = 0; i < value.length; i++) {
      (writer[method] as any)(value[i]);
    }
  }
  writer.join();
}

This seems to work in our case (I have a test where I encode a bunch of Math.random() values, encode the message, immediately decode and check that the original and decoded data match), but this is my first time looking at the protobuf-es code so I don't know enough to be sure there isn't some gotcha with doing this?

Assuming such a change is sensible this would make sense at least for some of the other primitive types, if not in general?

timostamm commented 7 months ago

Hey Malcolm, thanks for the comment! This could be applied to all packed protobuf types with fixed size. There's no free lunch though, the downsides are increased bundle size and breaking the IBinaryWriter interface.

Editions will stir up the code paths a bit, so it does not make sense to pull in this performance improvement right now, but it does seem worthwhile to do so after we implemented support for editions. I wonder if an argument to fork() with a size to allocate in advance would be an alternative here. It's also likely that the perf improvement for parsing added in https://github.com/bufbuild/protobuf-es/pull/459 applies to serializing as well.

timostamm commented 7 months ago

It looks like a similar perf improvement applied for binary read (see https://github.com/bufbuild/protobuf-es/issues/333#issuecomment-1699176072) can also be applied for binary write. Some details we're noted in https://github.com/bufbuild/protobuf-es/pull/674#issuecomment-1908178716.

Ekrekr commented 6 days ago

We currently use protobufjs, and would love to switch over to protobuf-es

Just to add a +1: this is the same for us, we can't switch over because the slowdown for encode and decode we would experience is about ~15x.

This library is awesome though! You write very clean code, that's nice to interface with.

Ekrekr commented 5 days ago

I set up a repo for comparing encode speeds or protobufs: https://github.com/Ekrekr/prototiming.

~Looking into the flame graphs, even the UTF8 string encoding is faster in ProtobufJS, but they do some crazy stuff~ the ECMA encoding I think is faster, it's the memory copying that slows this down:

It seems like protobufjs is a lot more efficient at writing directly to the buffer, rather than chunking and copying around array buffers.

I'll dig into this a bit more.

Ekrekr commented 5 days ago

The most clear place to optimize seems to be in finish.

An array buffer is made during chunking: https://github.com/bufbuild/protobuf-es/blob/f76c9f528b8c9428893ab89f1fc8864c9d642870/packages/protobuf/src/wire/binary-encoding.ts#L249

And a new array buffer is made at finish, and the contents of the chunking copied over to it iteratively: https://github.com/bufbuild/protobuf-es/blob/f76c9f528b8c9428893ab89f1fc8864c9d642870/packages/protobuf/src/wire/binary-encoding.ts#L139

If instead of creating a new array buffer at finish, an array buffer could be made at the start and written to directly, this would prevent writing duplication.