Joystream / joystream

Joystream Monorepo
http://www.joystream.org
GNU General Public License v3.0
1.42k stars 115 forks source link

[Argus] Fix distributor node syncing query #4921

Closed zeeshanakram3 closed 11 months ago

zeeshanakram3 commented 11 months ago

Problem

The distributor node executes getDistributionBucketsWithObjectsByWorkerId query to get all the data objects that given node is supposed to distribute. The problem is that with the consistent growth of the storage directory, the response size of this query is becoming larger and larger. For some time this query was experiencing Timeout, upon investigating it turned out that while executing this query the graphql-server was consistently crashing with FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

image

Fix

This PR divides the given query into multiple smaller queries so that the graphql-server is successfully able to process it, until we find the reason for the memory leak happening in the graphql-server and create a proper fix.

bwhm commented 11 months ago

Great work. I wouldn't be surprised if the storage-node needs a similar fix. @kdembler suggested adding a small delay between fetching batches. That is probably a good idea. What might work better is to use generators, yielding one batch at a time, allow consumer/caller to process the batch of objects before coming back for more.

I will merge then bump the version of argus and prepare a docker release (using the #4886 branch)

The storage node already uses pagination at least, and from my testing, it wasn't as bad. It would help the overall system load to bump the syncInterval from the current 1 minute...

mnaamani commented 11 months ago

The storage node already uses pagination at least, and from my testing, it wasn't as bad. It would help the overall system load to bump the syncInterval from the current 1 minute...

https://github.com/Joystream/joystream/pull/4924