feat(resetchain): add ability to specify tables

Pitasi commented 2 years ago

I have seen that sometimes (on auth table) cockroachdb doesn't use the correct index, but it uses a more inefficient one. With this PR I add the ability to override that by passing a custom flag like ./resetchain -force-indexes auth@correct_index.

DeshErBojhaa commented 2 years ago

How the code behaves when one table has multi column index?

Pitasi commented 2 years ago

How the code behaves when one table has multi column index?

The problem is exactly this one, I should have explained it better!

auth table for example, has two secondary indexes:

root@localhost:26258/tracelistener> show indexes from auth;
  table_name |                 index_name                 | non_unique | seq_in_index |  column_name   | direction | storing | implicit
-------------+--------------------------------------------+------------+--------------+----------------+-----------+---------+-----------
  auth       | auth_chain_name_address_account_number_key |   false    |            1 | chain_name     | ASC       |  false  |  false
  auth       | auth_chain_name_address_account_number_key |   false    |            2 | address        | ASC       |  false  |  false
  auth       | auth_chain_name_address_account_number_key |   false    |            3 | account_number | ASC       |  false  |  false
  auth       | auth_chain_name_address_account_number_key |   false    |            4 | id             | ASC       |  false  |   true
  auth       | auth_chain_name_id_idx                     |    true    |            1 | chain_name     | ASC       |  false  |  false
  auth       | auth_chain_name_id_idx                     |    true    |            2 | id             | ASC       |  false  |  false
  auth       | primary                                    |   false    |            1 | id             | ASC       |  false  |  false
(7 rows)

When running resetchain, the query it runs are something like this:

DELETE FROM auth WHERE (id <= 746973659877834754) AND (chain_name = 'cosmos-hub') ORDER BY id DESC LIMIT 5000 RETURNING id;

I would expect CockroachDB to use the auth_chain_name_id_idx since it perfectly match my "where" clause.

Now let me analyze that query:

root@localhost:26258/tracelistener> explain DELETE FROM auth WHERE (id <= 746973659877834754) AND (chain_name = 'cosmos-hub') ORDER BY id DESC LIMIT 5000 RETURNING id;
                                               info
---------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • delete
  │ estimated row count: 5,000
  │ from: auth
  │ auto commit
  │
  └── • limit
      │ estimated row count: 5,000
      │ count: 5000
      │
      └── • filter
          │ estimated row count: 959,775
          │ filter: chain_name = 'cosmos-hub'
          │
          └── • revscan
                estimated row count: 3,685,569 (59% of the table; stats collected 18 minutes ago)
                table: auth@primary
                spans: [ - /746973659877834754]

as you can read from above, it initially does a revscan using auth@primary - and that results in 3mln rows 🤯 After that it filters those 3mln by chain_name and apply the 5000 limit.

That means really bad performances while deleting (I've seen it myself).

Now let me show what happens when I use the "force index selection" feature (= I only appended @auth_chain_name_id_idx to table name):

root@localhost:26258/tracelistener> explain DELETE FROM auth@auth_chain_name_id_idx WHERE (id <= 746973659877834754) AND (chain_name = 'cosmos-hub') ORDER BY id DESC LIMIT 5000 RETURNING id;
                                            info
---------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • delete
  │ estimated row count: 5,000
  │ from: auth
  │ auto commit
  │
  └── • index join
      │ estimated row count: 5,000
      │ table: auth@primary
      │
      └── • revscan
            estimated row count: 5,000 (0.08% of the table; stats collected 23 minutes ago)
            table: auth@auth_chain_name_id_idx
            spans: [/'cosmos-hub' - /'cosmos-hub'/746973659877834754]
            limit: 5000

This time the initial revscan can precisely match that 5.000 rows I need, without scanning ALL of the rows in the auth table 😅 performances = happy

Does this make sense?

Pitasi commented 2 years ago

BTW I have seen CockroachDB picking up the wrong index only for really large chains (osmosis and cosmos-hub) and only for the bigger tables (delegations, auth). Using the correct index make the resetchain command exec time go from 30+ minutes to 2-3 minutes.

DeshErBojhaa commented 2 years ago

How the code behaves when one table has multi column index? ...

Thanks for the clarification 🔆

Pitasi commented 2 years ago

I also added a feature for selecting the tables to be reset (instead of all of them, which remains the default behaviour)

Pitasi commented 2 years ago

Say someone provided 9 correct index name and 1 incorrect index name. We don't want the program to panic and leave an intermediate state, which may lead to subtle bugs.

the program is intended to keep going and "delete all it can" so it will indeed delete 9 tables and at the end it will print a small report saying that it wasn't to delete X because index Y doesn't exist

Pitasi commented 2 years ago

Since we're using the table flag, I wonder, do we really need the forceIndexes flag.

It makes sense. If we proceed this way, I think of removing the default tables list altogether and let the caller decide which tables needs to be deleted.

Pitasi commented 2 years ago

@DeshErBojhaa I removed -force-index, I think less code is better (less logic -> easier to understand).

DeshErBojhaa commented 2 years ago

Also less code == easy review :p

Pitasi commented 2 years ago

@DeshErBojhaa can review this please? thanks!

EmerisHQ / tracelistener

feat(resetchain): add ability to specify tables #77