CachingPipeline will relay only on previously set batch size without modifying it
Removed
Default global CachingPipeline batch size
Deprecated
Security
Description
Previously using cache was changing the batch size which for long and complicated pipelines could end up in significant performance drop.
Before
df()
->read(...)
->batchSize(10_000)
->withEntry(...) // batch size here is 10k
->cache("test") // cache batch size is not provided so default batch size is used and changed to 100
->withEntry(...) // batch size here is 100
->run()
df()
->read(...)
->batchSize(10_000)
->withEntry(...) // batch size here is 10k
->cache("test", cacheBatchSize: 10) // cache batch size is set to 10
->withEntry(...) // batch size here is 10
->run()
After
df()
->read(...)
->batchSize(10_000)
->withEntry(...) // batch size here is 10k
->cache("test") // batch size here is still 10k
->withEntry(...) // batch size here is still 10k
->run()
df()
->read(...)
->batchSize(10_000)
->withEntry(...) // batch size here is 10k
->cache("test", cacheBatchSize: 5000) // batch size here is still 5k
->withEntry(...) // batch size here is now 5k
->run()
Change Log
Added
Fixed
Changed
Removed
Deprecated
Security
Description
Previously using cache was changing the batch size which for long and complicated pipelines could end up in significant performance drop.
Before
After