ManageIQ / manageiq-schema

SQL schema and database migrations for ManageIQ
Apache License 2.0
20 stars 125 forks source link

[RFE] Speed up `rake test:vmdb:setup` #502

Closed NickLaMuro closed 2 years ago

NickLaMuro commented 4 years ago

Migrations take a while to run (see benchmarks), and probably could be collapsed to save time at this point.

We have 7ish years of migrations built up that have to run on every build that a test:vmdb:setup or evm:db:reset task is run. This takes a lot of time for devs and build time on CI with no real value in doing that beyond them having them run semi-often in this repo.

Benchmarks

Note: Times below don't include db:seed, which could also be improved, but not the focus this RFE.

rake db:migrate (before)

$ time rake test:vmdb:setup

real    1m36.398s
user    1m6.130s
sys     0m4.525s

rake db:shema:load (after)

$ time rake evm:db:destroy db:schema:load

...

real    0m14.210s
user    0m6.666s
sys     0m3.044

(Note: metrics_rollups_* style tables are commented out for these to properly run, but the performance impact of those extra would be minimal, as most of the 14sec above is booting the Rails env)

Proposals

One of the following probably could be done to solve this:

The advantage to the first is that we can execute the "faster" code pretty much any time after a bin/update has been done. So we can get to a state of re:seeding much quicker, not just after a "collapse" has happened. That said, updating how the schema is generated to support this would be probably a significant bit of monkey patching to accomplish, and not a quick turn around.

That said, the "collapse" migration is basically a schema, so figuring out how collapse would probably lead to how we could generate the schema. It also might be worth "archiving" migrations after every release so that migrations are "versioned" to some degree.

Testing is also something we probably would want to consider with either approach. The advantage again with the first is that we don't have to be concerned using it in production, since it is only a dev speed up. Making sure we get the collapse correct would probably require some testing interface that runs the full collection of migrations from the beginning, and then tests them from a snapshot onward (if something like this doesn't already exist).

Links

NickLaMuro commented 4 years ago

So I spent the better part of today looking into how we could fix the schema (mostly trying to understand the ActiveRecord internals, and what how we have implemented the migrations around metrics. In short, I have a POC branch for how we could start to approach this:

https://github.com/ManageIQ/manageiq-schema/compare/master...NickLaMuro:fix-db-schema

As mentioned in the commit, this doesn't really create a valid schema, just allows the generated schema to run with rake db:schema:load. That said, this is a pattern lifted from projects that we probably could leverage (or at least use a inspiration):

(Note: The latter currently isn't heavily maintained at the moment)

But this leaves needing support:

I don't think it is worth me investing to much more time into this idea without some input and buy in from others, but scenic seems to have a solid pattern for this, where they prepend the overrides to the ActiveRecord::SchemaDumper:

https://github.com/rails/rails/blob/65c6f7030067b42e3c82c81d2424590ed61de29c/activerecord/lib/active_record/schema_dumper.rb#L102-L115

to also include their create_view definitions:

https://github.com/scenic-views/scenic/blob/main/lib/scenic/schema_dumper.rb#L6-L20 https://github.com/scenic-views/scenic/blob/main/lib/scenic/view.rb#L43-L52

Which is pretty much what I followed for my branch for the dumper. For the others, some additions probably will be need to the adapter like is done in scenic for create_view:

https://github.com/scenic-views/scenic/blob/048e08057e0bd76a700e7759344000cdc4e78235/lib/scenic/statements.rb#L25-L48

(and friends: drop_view, etc.)

kbrock commented 4 years ago

When we collapse migrations:

This works because in the older version of the code, the migrations are not collapsed yet.

Do we have a version of code that is from long time ago that we feel is reasonable to ask to upgrade to an interim version first

As for the invalid schema, I think we just use the code that did the migration in the first place. this is non trivial in code base like ours.

agree: Sure wish schema:load worked If I remember correctly, we also have issues because the schema is alphabetical, and the child tables are before the parent table

NickLaMuro commented 4 years ago

If I remember correctly, we also have issues because the schema is alphabetical, and the child tables are before the parent table

No, I think it is more that we do a lot of custom stuff in these two migrations (one being the "collapse" one):

https://github.com/ManageIQ/manageiq-schema/blob/62d91b62/db/migrate/20130923182042_collapsed_initial_migration.rb#L963-L970 https://github.com/ManageIQ/manageiq-schema/blob/62d91b62/db/migrate/20190122213042_use_views_for_metrics.rb

And both of those lean heavily on the custom additions and helper here:

https://github.com/ManageIQ/manageiq-schema/blob/62d91b62/lib/migration_helper.rb

So when the db/schema.rb is written, it doesn't know how to support those, so those triggers/views/alterations are left on the floor.

Update/Additional info:

When we collapse migrations:

Just to clarify: I realized this after digging heavily into the ActiveRecord codebase, but db/schema.rb is also a "collapsed migration", and the ActiveRecord::Schema actually inherits from ActiveRecord::Migration (it is a little more complicated then that, but it is mostly the case).

So we already have a baked in way in rails for collapsing migrations, we just don't/can't use it because of lib/migration_helper.rb (triggers, views, etc.).

kbrock commented 4 years ago

User Experience

There are technical hurdles to consolidating the various migrations. And our result will be similar to our current schema.rb

I think we also need to take into account the user, developer, and support person's experience when upgrading to more recent versions of the product.

The developers and users will need to run this code many times more. (which again is why consolidation is important to do, but also why it is important to not make it too hard for us, users, and support to upgrade versions.

product version f before the great consolidation

migration version
1 a
2 a
3 b
4 c
5 c
6 d
7 e
8 f

versions can have none or many migrations in them. The migration number is actually a date stamp.

product version a

migration version
1 a
2 a

product version c

migration version
1 a
2 a
3 b
4 c
5 c

product version f after

It collapsed migrations from version a, b, and c

migration version
1'{1-5} f
5'{guard} f
6 d
7 e
8 f

1' and 5' have the same exact version number as before but are rewritten. 1' contains the original 1-5 2,3,4 no longer exist 5' contains code that checks that the original 4 has already been run. if it has been run then nothing happens, but if it has not then the user is told to first upgrade to c,d, or e before upgrading to this version.

Users

Scenarios

Why is user a so complicated?

This is setup this way because users perviously running a and b, won't know how to run half of the consolidated migration.

The good news is that versions c,d, and e have the non-consolidated migrations (1-5), and can get the user past the consolidated migration and at a point that the future versions know how to migrate.

Take away

We can support all users, but we do inconvenience users running very old versions of the code, namely versions that have been consolidated.

While we do want to minimize the situation, the request to go through an interim versions is reasonable. We have done this in the past as well as other products in the wild.

Fryguy commented 4 years ago

I had started a collapse initial migration but it had to wait until jansa was released, so I could get a hard commit on which it could be done. It's actually relatively easy this time around.

Fryguy commented 4 years ago

@NickLaMuro I would say hold off on any specific fixes, because a collapse saves a huge amount of time.

NickLaMuro commented 4 years ago

@Fryguy because I apparently like to ignore you, I did throw together a POC PR for handling fixing the db/schema.rb ☝️ 👇 :

https://github.com/ManageIQ/manageiq-schema/pull/504

If nothing else, it can help generate a collapse migration and prevent errors.

kbrock commented 4 years ago

Do we have a version where we want the cutoff?

I would really like the old metrics code out of here. Not sure if that meshes with our business goals

chessbyte commented 4 years ago

@kbrock Business goals are being hashed out in this PR

Fryguy commented 4 years ago

Merged #504 - I also think we should do #506

NickLaMuro commented 4 years ago

I also think we should do #506

Throwing my hat into the ring as well to say I agree. My proposal above makes it seem like the two approaches above are mutually exclusive, which they are not, so I am +1 for doing both.

Fryguy commented 3 years ago

584 was merged (instead of #506)

Fryguy commented 3 years ago

@NickLaMuro Is this basically done? We also got https://github.com/ManageIQ/manageiq/pull/21259 and https://github.com/ManageIQ/pg-logical_replication/pull/6 since this was opened which improved performance a lot.

NickLaMuro commented 3 years ago

@Fryguy well, we could also update ManageIQ/manageiq to make use of rake db:schema:load, which would be even faster. That would really only fix dev setups, where the generated db/schema.rb already exist.

However, we could also generate the schema.rb in this repo, then update/enhance test:vmdb:setup to use that, and apply a db:migrate after to catch anything else. We would probably need a process in place for handling the schema.rb, and probably one of the two would work:

Each have their pros and cons, though I would argue the first is simpler and the devs on the team can just deal with a minor conflict here and there that is easily resolved by just doing rm -rf db/schema.rb and re-running db:migrate.


That said, do #504 and #506 #584 now speed up rake test:vmdb:setup? Yes.

The benchmarks above show it could be quicker, but up to the group on if we want to invest a bit more time getting a bit more of a speed up in CI/dev.