Feature Request: Performance

karlseguin commented 6 years ago

Hello.

Was wondering if there's any interest in making changes to improve performance? I just started looking at this and am taking it step by step. I'm looking at encoding first, and a simple benchmark compared to Jason, for one of our objects (nothing too special about it), doesn't look great:

Jason.encode         100000   25.63 µs/op
Protobuf.encode       50000   69.15 µs/op

If we remove the call to Protobuf.Validator.validate!(struct) in Encoder.encode (which is, by far, the lowest hanging fruit), we get that down to 33.31 µs/op.

I was thinking of two options (possibly adding support for both)

1 - Support a [validate: false] option to encode/2. This is trickier than it seems since opts isn't currently passed to child structure (although, it feels like it would be a good idea to do so anyways)

2 - Support a global config which enabled/disables validation

We'd probably use 2, enabling validation in dev/test, but disabling it in prod.

I'm happy to work on a PR for this, unless a) you don't want it or b) rather implement it yourself.

Ultimately, I think i'd be great (and reasonable) to get better performance than any JSON encoder.

tony612 commented 6 years ago

I'm trying to do some optimizing instead of supporting taking validations away. After all, data validity is very important.

karlseguin commented 6 years ago

I agree. I started looking at generating an encoder. If you take Encoder.encode/3, it's doing a lot of things at runtime that can be pushed to compile team. The downside is a larger beam file.

This is what I have so far. For now, I only support proto3 since that's all we need: https://github.com/2nd/protobuf-elixir/blob/feature/generate_encoder/lib/protobuf/generator.ex

Just want to add more baseline tests before adding more features (oneof, performance optimizations, ...)

For my Everything test struct, the numbers look good (and the validation is even more precise as I added specific range checks for integers):

Protobuf.generator        100000   11.81 µs/op
Jason.encode              100000   16.43 µs/op
Poison.encode             100000   21.93 µs/op
Protobuf.encode           100000   29.40 µs/op

tony612 commented 6 years ago

The result seems good, but I doubt if it has the same performance for many situations. Like what if most fields in the struct are not literal values, like

a = get_from_func_a()
b = get_from_func_b()
struct = %Foo{a: a, b: b}

But one thing you inspire me is maybe we can encode the fields with default values in compiling time because we often have some empty fields.

I prefer treating the macro solution as the last one we can use. Before that, I'll look into other possibilities. (Maybe the real last one is NIF XD)

btw, what I'm trying to do is trying to reduce some validations because some functions in Encoder may include the validations already.

karlseguin commented 6 years ago

I don't understand..it doesn't matter if the values are literals or not. The macro code expands to something like:

def encode(struct) do
  :erlang.iolist_to.binary([
     Generator.encode_field(<<11>>, :uint32, struct.id),  
     Generator.encode_field(<<18>>, :string, struct.name)  
  ])
end

11 and 18 just being the precomputed tag+type encoding. This precomputation is one example of things that don't need to happen on each call to encode/1. Given a proto of:

message Whatever {
  uint32 id = 1;
  string name = 2;
}

The encode_fnum/2 for these is always the same (11 and 18) so why do it over and over again?

tony612 commented 6 years ago

@karlseguin Yes, you're right. It works for encode_fnum/2. I thought the macro will try to handle all literal values when I saw encode_field.

karlseguin commented 6 years ago

The performance comes from more than encode_fnum/2 These map lookups, in the main encode/3, are eliminated:

      prop = props.field_props[tag]
      val = Map.get(struct, prop.name_atom)

Validation and the call to empty_val?/1 (which currently does 5 comparisons per value) are merged, more accurate and faster, by becoming something like:

def encode_field(_tag, nil, value)] do 
  <<>>
end

def encode_field(tag, :uint32, 0)  do
  <<>>
end

def encode_field(tag, :uint32, value) when value in > 0 and value <= 4_294_967_296 do 
  [...encode ..]
end

 ... other types

def encode_field(_tag, type, value) do
  # fail validation
end

Some of these gains are possibly possible without macroing the entire encode function (the fnum value could be stored in the props, for example).

tony612 commented 6 years ago

I move the validations to encoding, which improve the encoding performance by about half:

# use bench/script/bench.exs, but change time to 1m and disable HTML
Operating System: Linux
CPU Information: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
Number of Available Cores: 2
Available memory: 3.86 GB
Elixir 1.6.5
Erlang 20.3
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 1 min
parallel: 1
inputs: none specified
Estimated total run time: 2.07 min

# before
Name                                    ips        average  deviation         median         99th %
google_message1_proto3 Encode       18.03 K       55.47 μs    ±84.48%          53 μs          72 μs
google_message1_proto2 Encode       14.57 K       68.63 μs    ±72.05%          65 μs          93 μs

# after
Name                                    ips        average  deviation         median         99th %
google_message1_proto3 Encode       35.84 K       27.90 μs   ±212.31%          26 μs          37 μs
google_message1_proto2 Encode       23.75 K       42.11 μs   ±170.63%          39 μs          51 μs

The code is on master already.

tony612 commented 6 years ago

The latest benchmark result is

Name                                  ips      average    deviation     median     99th %
google_message1_proto3 Encode     53.08 K     18.84 μs     ±364.79%      17 μs      28 μs
google_message1_proto2 Encode     34.89 K     28.66 μs     ±251.24%      26 μs      37 μs

@karlseguin Could you verify the performance on your benchmarks?

karlseguin commented 6 years ago

Yes. I see a similar change. From the initially reported 29µs/op to 19. Nice work!

tony612 commented 6 years ago

btw, decoding is faster too:

# before
Name                                    ips        average  deviation         median         99th %
google_message1_proto2 Decode       28.59 K       34.98 μs   ±110.21%          33 μs          49 μs
google_message1_proto3 Decode       28.55 K       35.03 μs    ±97.93%          33 μs          49 μs

# after
Name                                    ips        average  deviation         median         99th %
google_message1_proto2 Decode       51.85 K       19.29 μs   ±280.78%          18 μs          29 μs
google_message1_proto3 Decode       51.77 K       19.32 μs   ±278.03%          18 μs          30 μs

amatalai commented 6 years ago

@tony612 hi, I noticed that after optimization, proto3 allows nil to be encoded as basic types (string, int32 etc) when previously it was raising errors from validator.

iex(1)> Example.new(message: "") |> Example.encode  |> Example.decode
%Example{message: ""}
iex(2)> Example.new(message: nil) |> Example.encode  |> Example.decode   
%Example{message: ""}

Is it intentional or bug?

elixir-protobuf / protobuf

Feature Request: Performance #34