perf(python): Improve tuple and list serializer performance

penguin-wwy commented 2 weeks ago

What does this PR do?

Pre-allocate memory for sequence containers based on the data size to avoid resizing and improve deserialization performance.

Related issues

Does this PR introduce any user-facing change?

[ ] Does this PR introduce any public API change?
[ ] Does this PR introduce any binary protocol compatibility change?

Benchmark

python format

python -m pyperf compare_to base.json opt.json
fury_large_tuple: Mean +- std dev: [base] 104 ms +- 2 ms -> [opt] 92.7 ms +- 5.5 ms: 1.13x faster
fury_large_list: Mean +- std dev: [base] 98.5 ms +- 3.7 ms -> [opt] 92.8 ms +- 5.3 ms: 1.06x faster

Benchmark hidden because not significant (2): fury_tuple, fury_list

xlang format

python -m pyperf compare_to base_xlang.json opt_xlang.json
fury_tuple: Mean +- std dev: [base_xlang] 262 us +- 6 us -> [opt_xlang] 259 us +- 5 us: 1.01x faster
fury_large_tuple: Mean +- std dev: [base_xlang] 104 ms +- 4 ms -> [opt_xlang] 90.0 ms +- 4.6 ms: 1.16x faster
fury_large_list: Mean +- std dev: [base_xlang] 97.6 ms +- 3.7 ms -> [opt_xlang] 90.0 ms +- 4.3 ms: 1.08x faster

Benchmark hidden because not significant (1): fury_list

chaokunyang commented 2 weeks ago

This is great! We have a new format which will improve performance a lot: https://fury.apache.org/docs/specification/fury_xlang_serialization_spec#list. Would you like to implement this format for pyfury?

penguin-wwy commented 2 weeks ago

This is great! We have a new format which will improve performance a lot: https://fury.apache.org/docs/specification/fury_xlang_serialization_spec#list. Would you like to implement this format for pyfury?

Okay, I will implement it.

apache / fury

perf(python): Improve tuple and list serializer performance #1933

What does this PR do?

Related issues

Does this PR introduce any user-facing change?

Benchmark