Removed buffer reusing from some serializers

alecthomas / go_serialization_benchmarks

Benchmarks of Go serialization methods

https://alecthomas.github.io/go_serialization_benchmarks/

1.56k stars 158 forks source link

Removed buffer reusing from some serializers #148

Closed deneonet closed 3 months ago

deneonet commented 3 months ago

Some serializer structs had buffer, which were reused, so I removed them + I updated benc to the latest version.

alecthomas commented 3 months ago

I think it's fine to remove reused buffers, but where applicable the serialisers should still be using preallocated buffers rather than letting the slices grow. The rationale is that if the encoder supports using pre-allocated buffers, the benchmarks should take advantage of them.

deneonet commented 3 months ago

I mean it's just easier to not allow buffer reusing, than writing multiple benchmarks just for buffer-reusing. There were also complains about it. But letting the slice grow on the baseline serializer, just kills the whole purpose of it, true. Let me fix it.

matheusd commented 3 months ago

Reusing already allocated and correctly sized buffers is one of the strategies to achieve high marshal performance on most serializers, and if a serializer does not support that, it necessarily can't achieve the best possible performance on several workloads (for example: a tight loop that serializes sequential values).

If a serializer doesn't support that, it should be obvious, either through the benchmark results or through an annotation of some sort.

There were also complains about it

What complaints?

deneonet commented 3 months ago

Here a pull request, which changed, that all serializer allocate memory (so no buffer-reuse) and here an issue, that was opened to remove the buffer-reuse from my own serializer. It should be made clear, whether buffer reusing is allowed or not.

deneonet commented 3 months ago

Also, for example, bebop, doesn't support buffer reusing natively, it was added using a 'buf' field in the serializer struct. Now my question: Why didn't you add the buffer field on every serializer that supports it, like MUS or benc?

200sc commented 3 months ago

Bebop does support buffer reuse natively, that's why it has a MarshalBebopTo method and it's docs suggest it's use in this way.

deneonet commented 3 months ago

Doesn't MarshalBebopTo require a buffer as argument?

matheusd commented 3 months ago

Quoting from the original issue:

Basically, I don't have an issue with any particular optimisations as long as it's very clear what they are and what the tradeoffs are.

Measuring with buffer reuse for the serializers that support is important to me, but I can see how that may different for users.

I see 2 possible solutions here:

Add a /bufferReuse alternate benchmark for the serializers that do support it (same as the unsafe variants).
Add a field in my Dynamic Report PR which tags the serializers that support buffer reuse (same as the tagging done for unsafe string unmarshal).

deneonet commented 3 months ago

I would be for the first solution, to also be able to compare benchmarks without buffer reuse.

alecthomas commented 3 months ago

I would be for the first solution, to also be able to compare benchmarks without buffer reuse.

Agreed, and this echoes my original statement. I do think that currently it is not obvious which serialisers are either using preallocated buffers, or reusing buffers. We should be consistent about that too.

ymz-ncnk commented 3 months ago

I would be for the first solution, to also be able to compare benchmarks without buffer reuse.

I also think this is a best solution. But now, in the Totals table for each serializer, we have two records: marshal and unmarshal. Is this correct? If so, for a serializer that supports unsafe and reuse, we will have 8 records:

marshal/serializer_name
marshal/serializer_name/reuse
marshal/serializer_name/unsafe
marshal/serializer_name/unsafe_reuse
unmarshal/serializer_name
unmarshal/serializer_name/reuse
unmarshal/serializer_name/unsafe
unmarshal/serializer_name/unsafe_reuse

Isn't it too much? We could have only 4:

serializer_name
serializer_name/reuse
serializer_name/unsafe
serializer_name/unsafe_reuse

matheusd commented 3 months ago

I also think this is a best solution. But now, in the Totals table for each serializer, we have two records: marshal and unmarshal.

See the new HTML report card: https://htmlpreview.github.io/?https://github.com/alecthomas/go_serialization_benchmarks/blob/master/report/index.html

It aggregates marshal/unmarshal into a single line, so there is (e.g.) BENC and BENC/unsafe. With reuse, there would be either one or two additional lines (BENC, BENC/reuse, BENC/unsafe, BENC/unsafe/reuse).

deneonet commented 3 months ago

The table doesn't show up for me, just the metrics text.

matheusd commented 3 months ago

Which browser are you using? Is JS enabled?

deneonet commented 3 months ago

Arc, I think that Js is enabled, otherwise any other site wouldn't work.

deneonet commented 3 months ago

On chrome is also doesn't work, on Floorp though. On Vivaldi it doesn't work too, I think it doesn't work on all chromium-based browsers.

matheusd commented 3 months ago

Some new form of blocking on chrome... :/

Will see if I can address it later.