cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.92k stars 3.78k forks source link

pg_table_is_visible slower in CockroachDB 23.1 #108334

Open timgraham opened 1 year ago

timgraham commented 1 year ago

Describe the problem

The Django test suite takes 60-65 minutes with CockraochDB 23.1.x compared to 40-45 minutes with CockroachDB 22.2.x. Some attempts at improving performance were https://github.com/cockroachdb/cockroach/issues/93955 and https://github.com/cockroachdb/cockroach/issues/100871 but the issue remains.

To Reproduce

Running the inspectdb Django test app as an example (./runtests.py inspectdb --settings=test_roach), the run time went from 31 seconds in 22.2.x to 40 seconds with 23.1.x. I turned on slow query logging to 1ms to log all queries. Then I wrote a script to parse the logs and group similar queries. 8 seconds of the 9-second performance difference are attributed to five groups of queries that use pg_table_is_visible.

Queries that include pg_table_is_visible when running inspectdb test app:

      Statement      | 23.1.x seconds | 22.1.x seconds | Query Count |
SELECT ‹c›.‹conname› |          20.55 |          14.24 |         108 | 
SELECT ‹indexname›,  |          12.66 |          11.80 |         108 | 
SELECT ‹a›.‹attname› |           3.65 |           1.47 |          47 |
SELECT ‹c›.‹relname› |           2.15 |           0.94 |          35 | 
SELECT ‹a1›.‹attname |           1.94 |           0.30 |          47 |

Full queries:

SELECT ‹c›.‹relname›, CASE WHEN ‹c›.‹relispartition› THEN ‹'p'› WHEN ‹c›.‹relkind› IN (‹'m'›, ‹'v'›) THEN ‹'v'› ELSE ‹'t'› END, ‹''› FROM pg_catalog.pg_class AS ‹c› LEFT JOIN pg_catalog.pg_namespace AS ‹n› ON ‹n›.‹oid› = ‹c›.‹relnamespace› WHERE ((‹c›.‹relkind› IN (‹'f'›, ‹'m'›, ‹'p'›, ‹'r'›, ‹'v'›)) AND (‹n›.‹nspname› NOT IN (‹‹'pg_catalog'››, ‹‹'pg_toast'››))) AND pg_table_is_visible(‹c›.‹oid›)
SELECT ‹a1›.‹attname›, ‹c2›.‹relname›, ‹a2›.‹attname› FROM ‹""›.‹""›.‹pg_constraint› AS ‹con› LEFT JOIN ‹""›.‹""›.‹pg_class› AS ‹c1› ON ‹con›.‹conrelid› = ‹c1›.‹oid› LEFT JOIN ‹""›.‹""›.‹pg_class› AS ‹c2› ON ‹con›.‹confrelid› = ‹c2›.‹oid› LEFT JOIN ‹""›.‹""›.‹pg_attribute› AS ‹a1› ON (‹c1›.‹oid› = ‹a1›.‹attrelid›) AND (‹a1›.‹attnum› = ‹con›.‹conkey›[‹1›]) LEFT JOIN ‹""›.‹""›.‹pg_attribute› AS ‹a2› ON (‹c2›.‹oid› = ‹a2›.‹attrelid›) AND (‹a2›.‹attnum› = ‹con›.‹confkey›[‹1›]) WHERE (((‹c1›.‹relname› = ‹'inspectdb_charfielddbcollation'›) AND (‹con›.‹contype› = ‹'f'›)) AND (‹c1›.‹relnamespace› = ‹c2›.‹relnamespace›)) AND pg_table_is_visible(‹c1›.‹oid›)
SELECT ‹c›.‹conname›, ARRAY (SELECT ‹attname› FROM ROWS FROM (unnest(‹c›.‹conkey›)) WITH ORDINALITY AS ‹cols› (‹colid›, ‹arridx›) JOIN ‹""›.‹""›.‹pg_attribute› AS ‹ca› ON ‹cols›.‹colid› = ‹ca›.‹attnum› WHERE ‹ca›.‹attrelid› = ‹c›.‹conrelid› ORDER BY ‹cols›.‹arridx›), ‹c›.‹contype›, (SELECT (‹fkc›.‹relname› || ‹'.'›) || ‹fka›.‹attname› FROM ‹""›.‹""›.‹pg_attribute› AS ‹fka› JOIN ‹""›.‹""›.‹pg_class› AS ‹fkc› ON ‹fka›.‹attrelid› = ‹fkc›.‹oid› WHERE (‹fka›.‹attrelid› = ‹c›.‹confrelid›) AND (‹fka›.‹attnum› = ‹c›.‹confkey›[‹1›])), ‹cl›.‹reloptions› FROM ‹""›.‹""›.‹pg_constraint› AS ‹c› JOIN ‹""›.‹""›.‹pg_class› AS ‹cl› ON ‹c›.‹conrelid› = ‹cl›.‹oid› WHERE (‹cl›.‹relname› = ‹'inspectdb_charfielddbcollation'›) AND pg_table_is_visible(‹cl›.‹oid›)
SELECT ‹indexname›, array_agg(‹attname› ORDER BY ‹arridx›), ‹indisunique›, ‹indisprimary›, array_agg(‹ordering› ORDER BY ‹arridx›), ‹amname›, ‹exprdef›, ‹s2›.‹attoptions› FROM (SELECT ‹c2›.‹relname› AS ‹indexname›, ‹idx›.*, ‹attr›.‹attname›, ‹am›.‹amname›, CASE WHEN ‹idx›.‹indexprs› IS NOT NULL THEN pg_get_indexdef(‹idx›.‹indexrelid›) END AS ‹exprdef›, CASE ‹am›.‹amname› WHEN ‹'prefix'› THEN CASE (‹option› & ‹1›) WHEN ‹1› THEN ‹'DESC'› ELSE ‹'ASC'› END END AS ‹ordering›, ‹c2›.‹reloptions› AS ‹attoptions› FROM (SELECT * FROM ‹""›.‹""›.‹pg_index› AS ‹i›, ROWS FROM (unnest(‹i›.‹indkey›, ‹i›.‹indoption›)) WITH ORDINALITY AS ‹koi› (‹key›, ‹option›, ‹arridx›)) AS ‹idx› LEFT JOIN ‹""›.‹""›.‹pg_class› AS ‹c› ON ‹idx›.‹indrelid› = ‹c›.‹oid› LEFT JOIN ‹""›.‹""›.‹pg_class› AS ‹c2› ON ‹idx›.‹indexrelid› = ‹c2›.‹oid› LEFT JOIN ‹""›.‹""›.‹pg_am› AS ‹am› ON ‹c2›.‹relam› = ‹am›.‹oid› LEFT JOIN ‹""›.‹""›.‹pg_attribute› AS ‹attr› ON (‹attr›.‹attrelid› = ‹c›.‹oid›) AND (‹attr›.‹attnum› = ‹idx›.‹key›) WHERE (‹c›.‹relname› = ‹'inspectdb_charfielddbcollation'›) AND pg_table_is_visible(‹c›.‹oid›)) AS ‹s2› GROUP BY ‹indexname›, ‹indisunique›, ‹indisprimary›, ‹amname›, ‹exprdef›, ‹attoptions›
SELECT ‹a›.‹attname› AS ‹column_name›, NOT (‹a›.‹attnotnull› OR ((‹t›.‹typtype› = ‹'d'›) AND ‹t›.‹typnotnull›)) AS ‹is_nullable›, pg_get_expr(‹ad›.‹adbin›, ‹ad›.‹adrelid›) AS ‹column_default›, CASE WHEN ‹collname› = ‹'default'› THEN ‹NULL› ELSE ‹collname› END AS ‹collation›, ‹a›.‹attidentity› != ‹''› AS ‹is_autofield›, col_description(‹a›.‹attrelid›, ‹a›.‹attnum›) AS ‹column_comment› FROM ‹""›.‹""›.‹pg_attribute› AS ‹a› LEFT JOIN ‹""›.‹""›.‹pg_attrdef› AS ‹ad› ON (‹a›.‹attrelid› = ‹ad›.‹adrelid›) AND (‹a›.‹attnum› = ‹ad›.‹adnum›) LEFT JOIN ‹""›.‹""›.‹pg_collation› AS ‹co› ON ‹a›.‹attcollation› = ‹co›.‹oid› JOIN ‹""›.‹""›.‹pg_type› AS ‹t› ON ‹a›.‹atttypid› = ‹t›.‹oid› JOIN ‹""›.‹""›.‹pg_class› AS ‹c› ON ‹a›.‹attrelid› = ‹c›.‹oid› JOIN ‹""›.‹""›.‹pg_namespace› AS ‹n› ON ‹c›.‹relnamespace› = ‹n›.‹oid› WHERE (((‹c›.‹relkind› IN (‹'f'›, ‹'m'›, ‹'p'›, ‹'r'›, ‹'v'›)) AND (‹c›.‹relname› = ‹'inspectdb_charfielddbcollation'›)) AND (‹n›.‹nspname› NOT IN (‹‹'pg_catalog'››, ‹‹'pg_toast'››))) AND pg_table_is_visible(‹c›.‹oid›)

Environment:

Jira issue: CRDB-30428

blathers-crl[bot] commented 1 year ago

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

If we have not gotten back to your issue within a few business days, you can try the following:

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

DrewKimball commented 1 year ago

Would it be possible to see the entire statements that are slow?

timgraham commented 1 year ago

I've added them in the issue description.

DrewKimball commented 1 year ago

Disabling the join hint in the pg_table_is_visible SQL body caused execution time on my machine to go from ~50ms for each query to 20-30ms. We can remove the join hints for the builtin functions that are expected to be inlined, but I think we should also consider ignoring join hints for functions that are inlined.

rafiss commented 7 months ago

Disabling the join hint in the pg_table_is_visible SQL body caused execution time on my machine to go from ~50ms for each query to 20-30ms.

@DrewKimball could you say more about how you tested that? I tried the change in this draft PR https://github.com/cockroachdb/cockroach/pull/109377/, and did not see much difference.

From runnning N=10 BENCHTIMEOUT=24h PKG=./pkg/bench/rttanalysis/ BENCHES=BenchmarkORMQueries/django ./scripts/bench 'HEAD^' 'HEAD'

goos: darwin
goarch: arm64
                                                         │ ./previous  │               ./new                │
                                                         │   sec/op    │   sec/op     vs base               │
ORMQueries/django_column_introspection_1_table-10          14.95m ± 2%   14.69m ± 1%  -1.78% (p=0.005 n=10)
ORMQueries/django_column_introspection_4_tables-10         14.65m ± 5%   14.71m ± 1%       ~ (p=0.631 n=10)
ORMQueries/django_column_introspection_8_tables-10         15.15m ± 2%   15.07m ± 3%       ~ (p=0.684 n=10)
ORMQueries/django_table_introspection_1_table-10           20.20m ± 3%   19.67m ± 2%  -2.60% (p=0.023 n=10)
ORMQueries/django_table_introspection_8_tables-10          20.49m ± 3%   20.26m ± 2%       ~ (p=0.165 n=10)
ORMQueries/django_comment_introspection_with_comments-10   24.19m ± 1%   24.30m ± 2%       ~ (p=0.529 n=10)
geomean                                                    17.93m        17.78m       -0.85%

                                                         │ ./previous │                ./new                │
                                                         │ roundtrips │ roundtrips  vs base                 │
ORMQueries/django_column_introspection_1_table-10          4.000 ± 0%   4.000 ± 0%       ~ (p=1.000 n=10) ¹
ORMQueries/django_column_introspection_4_tables-10         4.000 ± 0%   4.000 ± 0%       ~ (p=1.000 n=10) ¹
ORMQueries/django_column_introspection_8_tables-10         4.000 ± 0%   4.000 ± 0%       ~ (p=1.000 n=10) ¹
ORMQueries/django_table_introspection_1_table-10           5.000 ± 0%   5.000 ± 0%       ~ (p=1.000 n=10) ¹
ORMQueries/django_table_introspection_8_tables-10          5.000 ± 0%   5.000 ± 0%       ~ (p=1.000 n=10) ¹
ORMQueries/django_comment_introspection_with_comments-10   5.000 ± 0%   5.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                    4.472        4.472       +0.00%
¹ all samples are equal

                                                         │  ./previous  │                ./new                │
                                                         │     B/op     │     B/op      vs base               │
ORMQueries/django_column_introspection_1_table-10          10.05Mi ± 1%   10.11Mi ± 1%       ~ (p=0.353 n=10)
ORMQueries/django_column_introspection_4_tables-10         9.019Mi ± 1%   9.010Mi ± 2%       ~ (p=0.971 n=10)
ORMQueries/django_column_introspection_8_tables-10         7.760Mi ± 3%   7.724Mi ± 1%       ~ (p=0.315 n=10)
ORMQueries/django_table_introspection_1_table-10           12.55Mi ± 1%   12.48Mi ± 1%       ~ (p=0.089 n=10)
ORMQueries/django_table_introspection_8_tables-10          10.21Mi ± 1%   10.28Mi ± 2%       ~ (p=0.280 n=10)
ORMQueries/django_comment_introspection_with_comments-10   14.56Mi ± 1%   14.39Mi ± 1%  -1.20% (p=0.023 n=10)
geomean                                                    10.46Mi        10.45Mi       -0.17%

                                                         │ ./previous  │               ./new                │
                                                         │  allocs/op  │  allocs/op   vs base               │
ORMQueries/django_column_introspection_1_table-10          102.3k ± 1%   102.2k ± 1%       ~ (p=0.912 n=10)
ORMQueries/django_column_introspection_4_tables-10         88.80k ± 1%   88.58k ± 2%       ~ (p=0.684 n=10)
ORMQueries/django_column_introspection_8_tables-10         73.19k ± 2%   72.48k ± 2%       ~ (p=0.247 n=10)
ORMQueries/django_table_introspection_1_table-10           120.6k ± 1%   120.6k ± 1%       ~ (p=0.218 n=10)
ORMQueries/django_table_introspection_8_tables-10          92.54k ± 1%   93.32k ± 3%       ~ (p=0.393 n=10)
ORMQueries/django_comment_introspection_with_comments-10   128.8k ± 2%   126.5k ± 1%  -1.77% (p=0.005 n=10)
geomean                                                    99.24k        98.87k       -0.37%
DrewKimball commented 7 months ago

@DrewKimball could you say more about how you tested that? I tried the change in this draft PR https://github.com/cockroachdb/cockroach/pull/109377, and did not see much difference.

I just ran them manually on a local cluster on my machine - I didn't know we already had them in a benchmark. Though, I'm not sure that the benchmark is reproducing the issue, 14.95m means ms right? I'm guessing the slowness of these queries depends on the actual content of the tables involved.

rafiss commented 7 months ago

I'm guessing the slowness of these queries depends on the actual content of the tables involved.

Yeah maybe. Do you remember what you did manually in your local cluster to manifest this slowness?

DrewKimball commented 7 months ago

Yeah maybe. Do you remember what you did manually in your local cluster to manifest this slowness?

I didn't do anything in particular, but at the time I would have likely had the tpcc and tpch databases loaded. So, it might come down to the number and complexity of tables?