I recently noticed drastic performance degradation in the sorting operation, and since sort is set up to move to file based sorting only after reaching specific memory consumption, it wasn't that easy to nice it in the first place.
Pretty much the problem is a missed regression, after I changed the way how extractors/loaders are working (by default one row at time) I missed the fact that it will also hit the caching pipeline which affects sorting.
To resolve that issue new config entry was added, cache batch size which by default is set to 2000. This means that the caching pipeline will process 2000 rows at once, reducing the number of I/O operations.
On top of that I also changed default CompressingSerializer into NativeSerializer which is not doing any compressions that also gives us some noticeable performance boost.
Additionally, during the investigation, I noticed that PSRSimpleCache implementation is not the most optimal way of using PSR16Cache, I will create a dedicated issue for that.
Change Log
Added
Fixed
Changed
Removed
Deprecated
Security
Description
I recently noticed drastic performance degradation in the sorting operation, and since sort is set up to move to file based sorting only after reaching specific memory consumption, it wasn't that easy to nice it in the first place.
Pretty much the problem is a missed regression, after I changed the way how extractors/loaders are working (by default one row at time) I missed the fact that it will also hit the caching pipeline which affects sorting.
To resolve that issue new config entry was added,
cache batch size
which by default is set to 2000. This means that the caching pipeline will process 2000 rows at once, reducing the number of I/O operations.On top of that I also changed default CompressingSerializer into NativeSerializer which is not doing any compressions that also gives us some noticeable performance boost.
Additionally, during the investigation, I noticed that PSRSimpleCache implementation is not the most optimal way of using PSR16Cache, I will create a dedicated issue for that.