After doing some measurements on the performance of lodash.differenceWith it became evident that this was main cause of high CPU usage at start of sync run when handling very large arrays.
In this PR we use the Map.has() and Set.has() method in combination with Array.filter to get faster performance and avoid the node process hanging for multiple seconds at a time.
In addition to task processor is updated to catch task failures to avoid entire sync run ending prematurely.
edit:
I was curios why v3.8.x version had no issues, and it seems there is a huge performance hit we got when we switched from using _.difference() to _.differenceWith() in v3.9.x and up
After doing some measurements on the performance of
lodash.differenceWith
it became evident that this was main cause of high CPU usage at start of sync run when handling very large arrays.In this PR we use the
Map.has()
andSet.has()
method in combination withArray.filter
to get faster performance and avoid the node process hanging for multiple seconds at a time.In addition to task processor is updated to catch task failures to avoid entire sync run ending prematurely.
I have chosen not to include additional improvements from https://github.com/Joystream/joystream/pull/5026 for now although they will be very beneficial.
edit: I was curios why
v3.8.x
version had no issues, and it seems there is a huge performance hit we got when we switched from using_.difference()
to_.differenceWith()
inv3.9.x
and upExample for running test on my branch https://github.com/mnaamani/joystream/tree/colossus-tweaks
yarn storage-node util:test-1 --obligationCount 300000 --storedCount 100000