Optimizer - Githubissues

GoogleCodeExporter commented 9 years ago

Have an optimizer that is able to handle a limited set of standard SQL queries.

Original issue reported on code.google.com by john.david.duncan on 11 Sep 2007 at 10:52

GoogleCodeExporter commented 9 years ago

At configuration time, any key column that doesn't belong to a known index is 
added to the anonymous index 
(type 'A', name "*Anonymous*"). 

The optimizer runs in child_init().  Its job is to look over all directory 
configurations and to move key columns 
from the anonymous index into real indexes.  

Problem 1) how to get access to all the config::dir pointers in child_init()?  
In config::init_dir(), after allocating 
the structure, put a pointer to it in a static global array. 

Problem 2) Apache parses the configuration twice.  How do you know you've 
really got one copy of each 
config::dir in the global list? 

Here's how the optimizer could work:

For each directory: 
  If there is an anonymous index: 
      Get the data dictionary from NDB.
      Pass 1:  get a list of indexes
         We're going to create an IndexList - a list of real NDB indexes.
         Loop over the key columns in the anonymous index.
         For each key column, which real NDB indexes does it belong to? 
         Add each of these indexes to list. 
      Pass 2: narrow the list down to usable indexes
         For each item in the IndexList:
         If it's the Primary Key, and key columns exist for all parts, it's usable.
         If it's a Unique Index, and key columns exist for all parts, it's usable.
         If it's an Ordered Index, and key columns exist for a left prefix, it's usable.  
      Pass 3: assign key columns to usable indexes.
         Start with the Primary Key.  Assign key columns. 
         If there are no more unsassigned key columns, you're done.
         Next, assign columns to the unique indexes. 
         If there are no more unsassigned key columns, you're done.
         Next, assign columns to ordered indexes.
         If there are no more unsassigned key columns, you're done.
         All remaining unassigned columns become filters.

Original comment by john.david.duncan on 11 Sep 2007 at 11:08

GoogleCodeExporter commented 9 years ago

The optimizer can do something else important, too:

For each key column, it can store the NDB column number, the Column pointer, 
and any other needed 
information from the data dictionary in the key_columns array, so that they 
don't have to be looked up at 
runtime.

Original comment by john.david.duncan on 11 Sep 2007 at 11:13

GoogleCodeExporter commented 9 years ago

Following this design, it will ABSOLUTELY be necessary to restart Apache after 
any ALTER TABLE.

Original comment by john.david.duncan on 11 Sep 2007 at 11:15

GoogleCodeExporter commented 9 years ago

      Pass 2: narrow the list down to usable indexes
         For each item in the IndexList:
         If it's the Primary Key, and key columns exist for all parts, using the "equals" relop, it's usable.
         If it's a Unique Index, and key columns exist for all parts, using the "equals" relop, it's usable.
         If it's an Ordered Index, and key columns exist for a left prefix, using any relop, it's usable.

Original comment by john.david.duncan on 11 Sep 2007 at 11:28

GoogleCodeExporter commented 9 years ago

Flaw in the plan:

init_dir() is run while the configuration is parsed, but the directory is still 
unmerged.  Its inheritable attributes 
may all be null. In order to get complete information, you need to get access 
to a directory config structure 
that has been merged. 

However, merging of the config tree does not happen until runtime.  

In fact, setting a breakpoint a merge_dir() reveals that directory merges can 
happen 3 to 5 times per request.

Original comment by john.david.duncan on 12 Sep 2007 at 4:31

GoogleCodeExporter commented 9 years ago

It's possible to capture the path argument to init_dir() and store it into the 
dir structure.  This means I could 
do my own merging in child_init().

You can't do this thoroughly or correctly at init-time (which is why apache  
does it at runtime), but within 
some restrictions, it might work.  The restrictions include:
* ONLY use <Location> containers to configure mod_ndb.  Do not use <Directory>
* Do not use vhosts.
* Do not attempt multiple cluster connections.

Original comment by john.david.duncan on 12 Sep 2007 at 9:32

GoogleCodeExporter commented 9 years ago

The optimizer should never be wrong.

Suppose you have three anonymous key columns:  a, b, and c.

You also have ordered index idx1 on <a,b>, and ordered index idx2 on <a,c>.  

You could do an index scan on idx1 and use c is a filter, or you could do an 
index scan on idx2 and use b as a 
filter.  It's a tie.  The optimizer cannot make a decision about this.  The 
NdbDictionary does not provide the 
sort of cardinality statistics that other optimizers would use here. 

I believe mod_ndb should require you to rewrite the query using a hint.

Original comment by john.david.duncan on 12 Sep 2007 at 9:51

GoogleCodeExporter commented 9 years ago

Original comment by john.david.duncan on 4 Nov 2007 at 12:51

Added labels: Milestone-1.2
Removed labels: Milestone-Major

GoogleCodeExporter commented 9 years ago

Original comment by john.david.duncan on 28 Dec 2007 at 5:21

Added labels: Milestone-1.3
Removed labels: Milestone-1.2, Priority-Low

GoogleCodeExporter commented 9 years ago

Original comment by john.david.duncan on 28 Dec 2007 at 5:25

Added labels: Milestone-1.x
Removed labels: Milestone-1.3

dannote / mod-ndb

Optimizer #45