FirebirdSQL / firebird

Firebird server, client and tools
https://www.firebirdsql.org/
1.24k stars 213 forks source link

EXISTS clause with multiple indices in PLAN causes 100% CPU [CORE552] #907

Closed firebird-automations closed 8 years ago

firebird-automations commented 18 years ago

Submitted by: kbluck (kbluck)

SFID: 1409384#⁠ Submitted By: kbluck

Query plans with multiple indices in an EXISTS clause cause the query to hang with max CPU utilization.

Running Superserver on Win2000 SP4. This was seen in versions 1.52 and 1.53.RC3

Given a query using an EXISTS clause where the table in the exists subquery will find two useable indices, similar to this:

SELECT CE.CourseID FROM Certificate CE WHERE EXISTS( SELECT * FROM License LI WHERE LI.CustomerID = CE.CustomerID AND LI.JurisdictionCode = 'FL' )

where the License table has two indices defined as:

CREATE ASC INDEX LICENSEBYCUSTOMERID ON LICENSE (CUSTOMERID); CREATE ASC INDEX LICENSEBYJURISANDNUMBER ON LICENSE (JURISDICTIONCODE, LICENSENUMBER, QUALIFICATIONID);

the optimizer produces a query plan of:

PLAN (LI INDEX (LICENSEBYCUSTOMERID,LICENSEBYJURISANDNUMBER)) PLAN (CE NATURAL)

When this query is run, the server becomes unresponsive and CPU goes to 100%.

Preventing the optimizer from selecting index LICENSEBYJURISANDNUMBER by writing

... LI.JurisdictionCode || '' = 'FL' ...

resulting in a query plan of

PLAN (LI INDEX (LICENSEBYCUSTOMERID)) PLAN (CE NATURAL)

allows the query to complete normally without CPU hogging.

firebird-automations commented 18 years ago

Commented by: Alice F. Bird (firebirds)

Date: 2006-04-17 17:46 Sender: kbluck Logged In: YES user_id=11142

Field Position 1: 0.0153846153989434 Field Position 2: 7.88558338626899E-7 Field Position 3: 7.86274767961004E-7

firebird-automations commented 18 years ago

Commented by: Alice F. Bird (firebirds)

Date: 2006-01-27 09:05 Sender: dimitr Logged In: YES user_id=61270

What's statistics of the first segment (JURISDICTIONCODE) of index LICENSEBYJURISANDNUMBER? You can find this value in RDB$INDEX_SEGMENTS.

firebird-automations commented 18 years ago

Commented by: Alice F. Bird (firebirds)

Date: 2006-01-20 21:35 Sender: kbluck Logged In: YES user_id=11142

It was actually a new database with imported data, and so it was ODS11.

firebird-automations commented 18 years ago

Commented by: Alice F. Bird (firebirds)

Date: 2006-01-20 11:37 Sender: dimitr Logged In: YES user_id=61270

Did you test FB2 on the same database or did you perform a backup/restore cycle? Most of the optimizer benefits (including per-segment index statistics) is available only in ODS11, so you won't notice any difference with an ODS10 database. I'd like you to report the behaviour for the restored database too, if possible.

firebird-automations commented 18 years ago

Commented by: Alice F. Bird (firebirds)

Date: 2006-01-19 22:19 Sender: kbluck Logged In: YES user_id=11142

> IIRC, both these issues should be fixed in FB2

I have tested with 2.00.11675.beta1. Although the optimizer selects the same plan, running the query does not block up the server like 1.52. Other connections still run nicely. So, as far as I can tell the truly objectionable behavior in this bug has been fixed in 2.0.

> What's statistics of the both indices, BTW?

LICENSEBYJURISANDNUMBER is a multi-column index, and is actually quite selective in its entirety. However, just the first column Jurisdiction is not very selective by itself.

Index LICENSEBYCUSTOMERID (0) Depth: 3, leaf buckets: 1797, nodes: 1148504 Average data length: 0.00, total dup: 103406, max dup: 31 Fill distribution: 0 - 19% = 2 20 - 39% = 1 40 - 59% = 1649 60 - 79% = 1 80 - 99% = 144

Index LICENSEBYJURISANDNUMBER (3) Depth: 3, leaf buckets: 3166, nodes: 1148613 Average data length: 8.00, total dup: 20840, max dup: 8 Fill distribution: 0 - 19% = 2 20 - 39% = 0 40 - 59% = 2054 60 - 79% = 388 80 - 99% = 722

Below is the minimal DDL to create a test case for 1.52. For reference, I presently have about 3 million records in Certificate and 1 million in License. I presume you don't want to download 4m records; a random data generator should be able to fill the tables with suitable data.

CREATE TABLE CERTIFICATE ( CERTIFICATEID INTEGER NOT NULL, CUSTOMERID INTEGER NOT NULL, COURSEID SMALLINT NOT NULL, CONSTRAINT CERTIFICATEKEY PRIMARY KEY (CERTIFICATEID) ); CREATE TABLE LICENSE ( LICENSEID INTEGER NOT NULL, CUSTOMERID INTEGER NOT NULL, JURISDICTIONCODE CHAR( 2) COLLATE EN_US, LICENSENUMBER VARCHAR( 20) COLLATE EN_US, QUALIFICATIONID SMALLINT DEFAULT 0 NOT NULL, CONSTRAINT LICENSEKEY PRIMARY KEY (LICENSEID) );

CREATE ASC INDEX LICENSEBYCUSTOMERID ON LICENSE (CUSTOMERID); CREATE ASC INDEX LICENSEBYJURISANDNUMBER ON LICENSE (JURISDICTIONCODE, LICENSENUMBER, QUALIFICATIONID);

firebird-automations commented 18 years ago

Commented by: Alice F. Bird (firebirds)

Date: 2006-01-19 09:18 Sender: dimitr Logged In: YES user_id=61270

First, even the current superserver architecture should not block other requests while being under load, so the described hang is definitely a bug.

Second, optimizer should not be choosing the index LICENSEBYJURISANDNUMBER as I'm sure it's not selective. What's statistics of the both indices, BTW?

IIRC, both these issues should be fixed in FB2, so I'd expect the ticket submitter to test his issue on a new version and report back. If the hang still appears, we would need a test case.

firebird-automations commented 18 years ago

Commented by: Alice F. Bird (firebirds)

Date: 2006-01-19 01:46 Sender: kbluck Logged In: YES user_id=11142

I can write an absolutely horrid query, a full cartesian join on huge tables using a natural plan that returns an insane number of records. I can run 5 of them simultaneously. They take forever to execute, and naturally the server slows down considerably, but the server does not stop responding altogether. The CPU is high, but still fluctuates. It still accepts new connections, slowly, it still runs other queries, slowly, but the point is that it still is visibly servicing other clients.

Something about this particular plan completely blocks out everything else. Not merely degraded --- stopped cold. The CPU is pegged and doesn't move. Connections hang. New queries hang. That's something more than should be expected from a mere badly written SQL.

firebird-automations commented 18 years ago

Commented by: Alice F. Bird (firebirds)

Date: 2006-01-19 01:25 Sender: seanleyne Logged In: YES user_id=71163

The 'hanging' you are seeing is the result of many things, the SQL you are executing, the implementation of v1.5, the use of super-server instead of classic server.

Poorly written SQL can make any engine grind to a halt to expect otherwise is a little unrealistic.

At the very least, the use of classic server would go a long way to isolating the impact of 'run away' processes.
Since each database connection would span a separate OS process which would then be managed by the OS scheduler.
Whereas in the case of super-server Firebird uses it's own internal scheduler which continues to be enhanced (v2.0) but as you can see is not perfect.

I'll shut up now and see what others think on this subject.

firebird-automations commented 18 years ago

Commented by: Alice F. Bird (firebirds)

Date: 2006-01-19 00:34 Sender: kbluck Logged In: YES user_id=11142

Its not the optimizer's choice of plan that I think is the bug. It is the fact that executing this query ***hangs the server.*** Every client on the server hangs when this query runs. Out website doesn't work. Our customer service tool doesn't work. *Nothing works* when this query runs.

I can write plenty of highly inefficient queries that nevertheless do not render the server completely unresponsive; quite the contrary, even very labor intensive queries are normally scheduled quite nicely and do not affect other connections overmuch. If this were merely an issue with a long-running query, I wouldn't be squawking here. Its more than that.

Performance issues aside, this has to be regarded as a denial of service vulnerability if some random user with a report writer or interactive query tool can innocently hang the server while writing perfectly valid SQL. This is exactly what happens at my site. My users are not going to regard being told to run their ad-hoc SQL by the Firebird support forum before executing it as an acceptable resolution.

Excellent concurrency is supposed to be one of Firebird's crown jewels. I don't view dismissing this issue with "rewrite your SQL" as being well aligned with that reputation.

firebird-automations commented 18 years ago

Commented by: Alice F. Bird (firebirds)

Date: 2006-01-18 23:58 Sender: seanleyne Logged In: YES user_id=71163

Kevin,

In the absence of an multi-part index which combines CustomerID and JuristictionCode, the engine is making the best choice it can to resolve the query. As such what you are seeing is "as designed" functionality.

v2.0 contains improvements to the index structures and optimizer which (1) stores selectivity information for each part of a multi-part index and (2) uses this selectivity information to determine the best execution plan for queries.

Had you posted this issue to the support group, you would have been advised of the above.

Further, by posting to the support group, it might have been suggested that you redesign your query, since (on the face of it) it is more than likely that the Customers table with Juristiction = 'FL' is much, much smaller than Certificates and you could have 'prompted' the engine to avoid the natural/full table scan (PLAN (CE NATURAL)) of the Certificates table.

firebird-automations commented 18 years ago

Commented by: Alice F. Bird (firebirds)

Date: 2006-01-18 22:35 Sender: kbluck Logged In: YES user_id=11142

I don't think I need support. I already know how to work around the issue, as described in the ticket.

Perhaps I did not adequately describe the undesirable effects of this issue. It effectively hangs the server, preventing anything at all from working for considerable lengths of time. I can't believe that perfectly valid SQL which causes the Firebird server to "go stupid" and stop responding to other connections should not be considered a bug.

In hopes that my description was merely inadequate, I respectfully reopen and request reconsideration of whether this behavior is truly "as designed".

firebird-automations commented 18 years ago

Commented by: Alice F. Bird (firebirds)

Date: 2006-01-18 22:16 Sender: seanleyne Logged In: YES user_id=71163

This entry has been reviewed and has been evaluated as a support question/issue.

Accordingly, this issue is being closed and you are invited to post a message in the Firebird Support mailing list (to subscribe follow this link: http://groups.yahoo.com/group/firebird-support/)

firebird-automations commented 16 years ago
Modified by: @pcisar Workflow: jira \[ 10576 \] =\> Firebird \[ 14906 \]
firebird-automations commented 8 years ago
Modified by: @pavel-zotov QA Status: No test