Joystream / joystream

Joystream Monorepo
http://www.joystream.org
GNU General Public License v3.0
1.42k stars 115 forks source link

Minimise use and processing of large arrays #5051

Open mnaamani opened 8 months ago

mnaamani commented 8 months ago

We previously identified cpu intensive tasks processing large arrays, in particular the lodash differenceWith method, comparing two very large arrays. intersection used in getLocalDataObjectsByBagId() state api endpoint could also be problematic.

We should generally avoid this.

Choosing Map or Set if we are storing the data long term in memory.

For processing, try to fetch data in chunks, (gql queries with paging and result set limits). Using Async generators to make programming around this approach more efficient.

Another place where we "produce" large arrays, is with fs.promises.readdir() when reading list of objects in the uploads folder. Its okay to do it on startup, but look for places where we might do it more frequently like in the state api endpoint: getLocalDataStats()

Some examples in https://github.com/Joystream/joystream/pull/5026

kdembler commented 7 months ago

We can fetch the updates incrementally, even for the cleanup. We don't need to get a full list at all times, for example we can get a list of events for deleted objects since last run.