TryGhost / knex-migrator

DB migration tool for knex.js
MIT License
100 stars 27 forks source link

Integrity Check performance #252

Open ErisDS opened 2 years ago

ErisDS commented 2 years ago

When first called, knex-migrator runs an "integrity check" to make sure migrations are in a state it can understand.

Problem

This has been implemented in a straightforward way, first reading the versions folder to get all the versions, and then looping over those twice, first to lookup what is in the DB and second to lookup what is in the folder - i.e. a DB read and a file read per version.

https://github.com/TryGhost/knex-migrator/blob/cb9792852ae4446a94c17fe7c932faf64a042cae/lib/index.js#L1151-L1238

This code suffers from a performance issue, where the time it takes to run is in the order 0(2n) , where n is the number of versions. 2n rather than just n, as there are 2 loops both doing expensive operations and solving either would be an improvement.

Additionally, the file reads have been implemented using sync methods.

Solution

It should be possible to reduce the number of expensive and synchronous operations, and also get it down to a single loop, by:

  1. Using something like glob to handle reading all the files (it would be ideal to get a nested JSON structure back) asynchronously
  2. doing one request to the db for all migrations grouped by version
  3. Processing the results in a single loop

Fixing this should reduce the time this function takes with large numbers of migrations (i.e. Ghost) significantly

tpatel commented 2 years ago

Daniel shared some work he did a while ago: https://github.com/TryGhost/knex-migrator/commit/ed0c5169514854f624ec50c2885b316b7bd0f8f3

daniellockyer commented 2 years ago

https://github.com/TryGhost/knex-migrator/commit/838fe5462837902bfdda0525bd8a292f57aa0d9a solves half of the problem here - the DB queries. I had the code ready from before and felt they were more important because they go across network.

The file/folder lookup still needs some work but there's some refactoring needed in this area of code first because we're heavily using promise chaining and it's making it painful to work with 🙂