FirebirdSQL / firebird

Firebird server, client and tools
https://www.firebirdsql.org/
1.21k stars 209 forks source link

EXPLAIN statement and RDB$SQL package #7675

Closed sim1984 closed 9 months ago

sim1984 commented 12 months ago

Many DBMSs use a separate EXPLAIN statement to display plans. There are a number of advantages over using isc_info_sql_get_plan to get the plan:

  1. We can easily set the query plan output format.
  2. You can set other plan output options (for example, the depth of output of recursive plans)
  3. Plan output size is unlimited 64K

The EXPLAIN statement can return the plan as a BLOB cursor column or as isc_exec_proc (single row).

Syntax:

EXPLAIN [ ( option [, ...] ) ] statement

option :=
   ANALYZE
   FORMAT={PLAIN | TREE}
   DEPTH=n
   CARDINALITY
   COST

ANALYZE - executes a query without returning data and substitutes the actual measured values into the plan

FORMAT - output format. By default TREE . PLAIN - legacy plan. TREE - tree-like (explain). In the future, it can be extended to other formats JSON, XML...

DEPTH - depth of recursive output (output of plans of internal procedures). The default is 1.

CARDINALITY - output cardinality score for plan nodes.

COST - output cost estimates for plan nodes.

aafemt commented 12 months ago

IMHO isc_info_sql_get_plan should be declared deprecated in favor if IStatement::getPlan() that should change parameter from bool to bitset with options you mentioned. EXPLAIN can be ISQL-only operator.

mrotteveel commented 12 months ago

IMHO isc_info_sql_get_plan should be declared deprecated in favor if IStatement::getPlan() that should change parameter from bool to bitset with options you mentioned. EXPLAIN can be ISQL-only operator.

Doesn't that still use isc_info_sql_get_plan below the surface, at least for remote connections?

mrotteveel commented 12 months ago

Also making it ISQL-only doesn't sound very useful to me. A lot of people don't use ISQL, so making something that is ISQL-only limits the usefulness.

aafemt commented 12 months ago

The only alternative for ISQL is API and we have getPlan() there. Yes it uses isc_info_sql_get_plan under the hood but only because nobody has time and will to clean it out.

sim1984 commented 12 months ago

Other DBMSs do not restrict this statement to their tools. For example MySQL, Postgres

mrotteveel commented 12 months ago

The only alternative for ISQL is API and we have getPlan() there. Yes it uses isc_info_sql_get_plan under the hood but only because nobody has time and will to clean it out.

Just cleaning it out wouldn't be sufficient: there would need to be a replacement in the wire protocol... and given the opinfo is the* way to transfer that information across remote connections, you can't get rid of it without introducing something new.

mrotteveel commented 12 months ago

Also, you're ignoring an alternative, and that is it to make it a statement, which returns a result set or single row with the plan information.

aafemt commented 12 months ago

Technically one cannot bypass API using any tool so there is no difference if a programmer writes db->execute("EXPLAIN... SELECT...") or db->prepare("SELECT...")->getPlan().

Access to a BLOB or result set requires more code than access to plan.

dyemanov commented 12 months ago

Smells like some completely new concept to us -- non-DML statement that returns something.

sim1984 commented 12 months ago

The main thing in this ticket is not so much the task of bypassing the plan size restrictions, but to give convenience to administrators and testers in setting the plan output form. For example, recursive output of plans is planned in 5.0, but switching the output depth through the config is not very convenient. Or, for example, the output of a plan with an estimate of cardinality, which will break all tests.

aafemt commented 12 months ago

Package DBMS_SQL with method GET_PLAN() can do the trick.

sim1984 commented 12 months ago

Technically one cannot bypass API using any tool so there is no difference if a programmer writes db->execute("EXPLAIN... SELECT...") or db->prepare("SELECT...")->getPlan().

Access to a BLOB or result set requires more code than access to plan.

Any changes to the API or network protocol are very slow to take root among users. Adding features to the language, on the contrary, is fast, because nothing needs to be rewritten.

AlexPeshkoff commented 12 months ago

Restrict this statement to ISQL is bad idea. But @sim1984 : You should be more specific with format of returned data. Our SQL statements currently do not return just "BLOB cursor column". They always return cursor (or result set - here for us this 2 are identical). Talking about single row with single BLOB column is more correct, just let's call it "result set with single row with single BLOB column" - and use of generic SQL statement becomes quite logical, cause plan can be accessed using it with any tool, able to select BLOBs.

Next, why not use (optional certainly) format where each plan row is a row of returned result set, containing required fields like ID, PAR_ID (to support hierarchy), COST, CARDINALITY, etc.). That can be really nice for analyzing plans at client side (XML/JASON are good, but clents need additional support for them, on contrary fetch from cursor is able for probably any client out-of-the-box).

dyemanov commented 12 months ago

The main thing in this ticket is not so much the task of bypassing the plan size restrictions, but to give convenience to administrators and testers in setting the plan output form. For example, recursive output of plans is planned in 5.0, but switching the output depth through the config is not very convenient. Or, for example, the output of a plan with an estimate of cardinality, which will break all tests.

It can be handled with session management statements. Although I admit handling it per-statement could be handier.

asfernandes commented 12 months ago

Package DBMS_SQL with method GET_PLAN() can do the trick.

I like this idea.

mrotteveel commented 12 months ago

Technically one cannot bypass API using any tool so there is no difference if a programmer writes db->execute("EXPLAIN... SELECT...") or db->prepare("SELECT...")->getPlan().

Access to a BLOB or result set requires more code than access to plan.

Only if whatever you're using uses the Firebird API directly, or is even aware of the existence of Firebird. This isn't true for tools that use a generic API (like JDBC, for example DBeaver), or for older tools which simply haven't implemented such a thing. Being able to execute a generic statement will solve this.

mrotteveel commented 12 months ago

Package DBMS_SQL with method GET_PLAN() can do the trick.

How would it know which plan to return?

sim1984 commented 12 months ago

In principle, I'm not opposed to a built-in package or a function for inferring a plan. It is not much more difficult for the end user to use. The only downside is the escaping of string literals inside the query.

aafemt commented 12 months ago

How would it know which plan to return?

By calling DBMS_SQL.PREPARE() beforehand for example.

asfernandes commented 12 months ago

How would it know which plan to return?

By calling DBMS_SQL.PREPARE() beforehand for example.

The only downside is the escaping of string literals inside the query.

Simple as

select * from rdb$sql.get_plan(q'{select 'a' from rdb$database}')
mrotteveel commented 12 months ago

Ok, that could work. It is less succinct than the proposed EXPLAIN though.

sim1984 commented 12 months ago

OK. In principle, the built-in package would satisfy me. Then it remains to determine the input and output parameters of the rdb$sql.get_plan procedure. And another package from one procedure looks suspicious. Probably some more procedures/functions are needed to work with SQL. Well, for example, returning the names and types of result columns or types of input parameters. Well, perhaps a simple function for checking the correctness of the SQL query returning BOOLEAN.

asfernandes commented 12 months ago

We can have functions like Dimitry said, that returns a handle on prepare and use this handle with others functions to inspect, execute, fetch, etc. There is room for a RDB$SQL package, certainly.

The version I said that get plan with a direct sql text would be a syntatic sugar.

sim1984 commented 12 months ago

Fine. Will RDB$SQL package be included in the plan for Firebird 6.0?

livius2 commented 12 months ago

Why EXPLAIN cannot be implemented as single command EXPLAIN 'SELECT X FROM Y'. It can call behind the scene

SELECT EXPLAIN('SELECT X FROM Y') FROM RDB$DATABASE\

and simply return blob field with name "PLAN"

asfernandes commented 12 months ago

Why EXPLAIN cannot be implemented as single command EXPLAIN 'SELECT X FROM Y'. It can call behind the scene

SELECT EXPLAIN('SELECT X FROM Y') FROM RDB$DATABASE\

and simply return blob field with name "PLAN"

It makes much more sense to return things in different rows and columns.

You like to call everything simple, but seems you're still unable to build the sources. Maybe the things you think are simple is not.

AlexPeshkoff commented 12 months ago

On 7/13/23 03:09, Adriano dos Santos Fernandes wrote:

It makes much more sense to return things in different rows and columns.

Definitely agreed.

livius2 commented 12 months ago

It makes much more sense to return things in different rows and columns.

You like to call everything simple, but seems you're still unable to build the sources. Maybe the things you think are simple is not.

This was only sample, you can call behind the scene whatever you want. You can call even EXECUTE BLOCK which can change structure (returned columns) in different situations. If you call e.g.

EXPLAIN(PLAIN TEXT) 'SELECT X FROM Y'

you will got EXECUTE BLOCK returning one single BLOB field

if you call

EXPLAIN(DETAILED ROWS) 'SELECT X FROM Y'

it will return multiple rows and columns.

You like to call everything simple

no at all :) I am also developer and our world is only sometimes simply.

dyemanov commented 12 months ago

The function/package + result set solution fits our architecture better, but simple EXPLAIN/ANALYZE commands would still be handly. So I'd vote to have it as ISQL extension (hiding the underlying SQL and pretty-formatting the result set).

sim1984 commented 12 months ago

The function/package + result set solution fits our architecture better, but simple EXPLAIN/ANALYZE commands would still be handly. So I'd vote to have it as ISQL extension (hiding the underlying SQL and pretty-formatting the result set).

+1. ISQL could easily convert the EXPLAIN command into a call to the appropriate package procedure.

asfernandes commented 12 months ago

Fine. Will RDB$SQL package be included in the plan for Firebird 6.0?

Considering the EXPLAIN feature, certainly possible. I have even started it already.

asfernandes commented 12 months ago

It makes much more sense to return things in different rows and columns. Definitely agreed.

The question is what we want to separate.

A definitively one is each plan line in own record.

But how much more we want? Just it or separate operation, object names, aliases, etc?

livius2 commented 12 months ago

But how much more we want? Just it or separate operation, object names, aliases, etc?

Yes, this is natural choice. Something like: Operation: Nested Loop Join JoinType: Inner OjectName: RDB$RELATIONS Alias: R

But decision IndexName: RDB$INDEX_0 or maybe again ObjectName: RDB$INDEX_0?

Scan: Range Match: Full

But what if you need to add more infos? like: PhysicalIO PageReads: 54 LogicalIO: RecordSeqReads: 7022 https://github.com/FirebirdSQL/firebird/pull/223#issuecomment-533527403

dyemanov commented 12 months ago

If we speak about separating records, it makes a lot of sense to separate also columns. Node ID, parent ID, access type, maybe access subtype, object name, object alias, record length, cardinality (ifdef'ed currently), cost (to be developed).

sim1984 commented 12 months ago

Outputting a record for each plan node is good, as it can be used by third party tools for their own plan output format or even graphical representation. However, this does not eliminate the need for functions that display the plan in the usual formats (Legacy, explain (tree)).

sim1984 commented 12 months ago

Read, Fetch, writes, mark pages, memory usage, and query execution time can be output for explain analyze.

asfernandes commented 12 months ago

Read, Fetch, writes, mark pages, memory usage, and query execution time can be output for explain analyze.

This is out of scope of what was being discussed.

I do also think these are things for the profiler.

asfernandes commented 11 months ago

Please see initial implementation of this (not exactly as described in issue, but as discussed in comments) in https://github.com/FirebirdSQL/firebird/pull/7697