Closed karlseguin closed 6 years ago
I'm trying to do some optimizing instead of supporting taking validations away. After all, data validity is very important.
I agree. I started looking at generating an encoder. If you take Encoder.encode/3
, it's doing a lot of things at runtime that can be pushed to compile team. The downside is a larger beam file.
This is what I have so far. For now, I only support proto3 since that's all we need: https://github.com/2nd/protobuf-elixir/blob/feature/generate_encoder/lib/protobuf/generator.ex
Just want to add more baseline tests before adding more features (oneof, performance optimizations, ...)
For my Everything
test struct, the numbers look good (and the validation is even more precise as I added specific range checks for integers):
Protobuf.generator 100000 11.81 µs/op
Jason.encode 100000 16.43 µs/op
Poison.encode 100000 21.93 µs/op
Protobuf.encode 100000 29.40 µs/op
The result seems good, but I doubt if it has the same performance for many situations. Like what if most fields in the struct are not literal values, like
a = get_from_func_a()
b = get_from_func_b()
struct = %Foo{a: a, b: b}
But one thing you inspire me is maybe we can encode the fields with default values in compiling time because we often have some empty fields.
I prefer treating the macro solution as the last one we can use. Before that, I'll look into other possibilities. (Maybe the real last one is NIF XD)
btw, what I'm trying to do is trying to reduce some validations because some functions in Encoder may include the validations already.
I don't understand..it doesn't matter if the values are literals or not. The macro code expands to something like:
def encode(struct) do
:erlang.iolist_to.binary([
Generator.encode_field(<<11>>, :uint32, struct.id),
Generator.encode_field(<<18>>, :string, struct.name)
])
end
11 and 18 just being the precomputed tag+type encoding. This precomputation is one example of things that don't need to happen on each call to encode/1
. Given a proto of:
message Whatever {
uint32 id = 1;
string name = 2;
}
The encode_fnum/2
for these is always the same (11 and 18) so why do it over and over again?
@karlseguin Yes, you're right. It works for encode_fnum/2
. I thought the macro will try to handle all literal values when I saw encode_field
.
The performance comes from more than encode_fnum/2
These map lookups, in the main encode/3
, are eliminated:
prop = props.field_props[tag]
val = Map.get(struct, prop.name_atom)
Validation and the call to empty_val?/1
(which currently does 5 comparisons per value) are merged, more accurate and faster, by becoming something like:
def encode_field(_tag, nil, value)] do
<<>>
end
def encode_field(tag, :uint32, 0) do
<<>>
end
def encode_field(tag, :uint32, value) when value in > 0 and value <= 4_294_967_296 do
[...encode ..]
end
... other types
def encode_field(_tag, type, value) do
# fail validation
end
Some of these gains are possibly possible without macroing the entire encode function (the fnum value could be stored in the props, for example).
I move the validations to encoding, which improve the encoding performance by about half:
# use bench/script/bench.exs, but change time to 1m and disable HTML
Operating System: Linux
CPU Information: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
Number of Available Cores: 2
Available memory: 3.86 GB
Elixir 1.6.5
Erlang 20.3
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 1 min
parallel: 1
inputs: none specified
Estimated total run time: 2.07 min
# before
Name ips average deviation median 99th %
google_message1_proto3 Encode 18.03 K 55.47 μs ±84.48% 53 μs 72 μs
google_message1_proto2 Encode 14.57 K 68.63 μs ±72.05% 65 μs 93 μs
# after
Name ips average deviation median 99th %
google_message1_proto3 Encode 35.84 K 27.90 μs ±212.31% 26 μs 37 μs
google_message1_proto2 Encode 23.75 K 42.11 μs ±170.63% 39 μs 51 μs
The code is on master already.
The latest benchmark result is
Name ips average deviation median 99th %
google_message1_proto3 Encode 53.08 K 18.84 μs ±364.79% 17 μs 28 μs
google_message1_proto2 Encode 34.89 K 28.66 μs ±251.24% 26 μs 37 μs
@karlseguin Could you verify the performance on your benchmarks?
Yes. I see a similar change. From the initially reported 29µs/op to 19. Nice work!
btw, decoding is faster too:
# before
Name ips average deviation median 99th %
google_message1_proto2 Decode 28.59 K 34.98 μs ±110.21% 33 μs 49 μs
google_message1_proto3 Decode 28.55 K 35.03 μs ±97.93% 33 μs 49 μs
# after
Name ips average deviation median 99th %
google_message1_proto2 Decode 51.85 K 19.29 μs ±280.78% 18 μs 29 μs
google_message1_proto3 Decode 51.77 K 19.32 μs ±278.03% 18 μs 30 μs
@tony612 hi, I noticed that after optimization, proto3 allows nil
to be encoded as basic types (string, int32 etc) when previously it was raising errors from validator.
iex(1)> Example.new(message: "") |> Example.encode |> Example.decode
%Example{message: ""}
iex(2)> Example.new(message: nil) |> Example.encode |> Example.decode
%Example{message: ""}
Is it intentional or bug?
Hello.
Was wondering if there's any interest in making changes to improve performance? I just started looking at this and am taking it step by step. I'm looking at encoding first, and a simple benchmark compared to
Jason
, for one of our objects (nothing too special about it), doesn't look great:If we remove the call to
Protobuf.Validator.validate!(struct)
inEncoder.encode
(which is, by far, the lowest hanging fruit), we get that down to 33.31 µs/op.I was thinking of two options (possibly adding support for both)
1 - Support a
[validate: false]
option toencode/2
. This is trickier than it seems sinceopts
isn't currently passed to child structure (although, it feels like it would be a good idea to do so anyways)2 - Support a global config which enabled/disables validation
We'd probably use 2, enabling validation in dev/test, but disabling it in prod.
I'm happy to work on a PR for this, unless a) you don't want it or b) rather implement it yourself.
Ultimately, I think i'd be great (and reasonable) to get better performance than any JSON encoder.