flow-php / flow

Flow PHP - data processing framework
https://flow-php.com
MIT License
404 stars 23 forks source link

PSRSimpleCache - Performance issue #1035

Open norberttech opened 2 months ago

norberttech commented 2 months ago

Because of the internal index in PSRSimpleCache every single time we want to add something into the cache, we need to perform following operations:

so only after that, we can put actual rows into the cache. The biggest bottleneck is here:

https://github.com/flow-php/flow/blob/1.x/src/core/etl/src/Flow/ETL/ExternalSort/CacheExternalSort.php#L42-L44

All the above operations on cache will be executed at least as many times as many rows we have. So 10k rows will generate around 40k hits to cache storage. This problem does not exists with the LocalFIlesystemCache because instead of checking if the index exists, it's simply trying to create or open it and then just appending new id at the end of it.

norberttech commented 2 months ago

That issue was revealed when I was working on https://github.com/flow-php/flow/pull/1034

norberttech commented 2 months ago

Not fully resolved, but improved a lot by https://github.com/flow-php/flow/pull/1036 mostly due to reducing number of writes/reads to/from cache.