AmpersandTarski / Ampersand

Build database applications faster than anyone else, and keep your data pollution free as a bonus.
http://ampersandtarski.github.io/
GNU General Public License v3.0
40 stars 8 forks source link

The (sub-)expression x;V can be compiled more efficiently #1298

Closed stefjoosten closed 2 years ago

stefjoosten commented 2 years ago

What happened

I see that the Ampersand compiler v4.6.3 translates the expression buddy;V to:

SELECT fence0.src AS src,
                fence1.tgt AS tgt
FROM
           (SELECT "AanmeldingGG" AS src,
                   "buddy" AS tgt
            FROM "AanmeldingGG"
            WHERE ("AanmeldingGG" IS NOT NULL)
              AND ("buddy" IS NOT NULL)) AS fence0,

           (SELECT fst."Vrijwilliger" AS src,
                   snd."Vrijwilliger" AS tgt
            FROM "Person" AS fst,
                 "Person" AS snd
            WHERE (fst."Vrijwilliger" IS NOT NULL)
              AND (snd."Vrijwilliger" IS NOT NULL)) AS fence1
WHERE (fence0.tgt = fence1.src)

What I expected

I would expect it to produce

SELECT fence0.src AS src,
                fence1.tgt AS tgt
FROM
           (SELECT "AanmeldingGG" AS src,
                   "buddy" AS tgt
            FROM "AanmeldingGG"
            WHERE ("AanmeldingGG" IS NOT NULL)
              AND ("buddy" IS NOT NULL)) AS fence0,

           (SELECT snd."Vrijwilliger" AS tgt
            FROM  "Person" AS snd
            WHERE (snd."Vrijwilliger" IS NOT NULL)) AS fence1

The latter saves an (implicit) join operator.

hanjoosten commented 2 years ago

I expect the database to take care of an optimization of such a query. No need for us to clutter our code, by which we compromise maintainability. Please reopen iff you have a measurable use case for this optimization.

stefjoosten commented 2 years ago

There is another argument: the readability of the generated code. That has always been important to me.

I have three objections to your arguments:

  1. The claim that the database takes care of this optimization is speculative. Please provide evidence.
  2. The efficiency goes from $O(n^2)$ to $O(n)$. That is an order of magnitude, which does not require a measurable use case.
  3. The argument of cluttering our code does not hold, because we have designed the code structure such that optimizations like this remain very maintainable.
Michiel-s commented 2 years ago

I agree with @hanjoosten, let's not spent time on rewriting the query generator here. Let me give evidence on the database taking care of this optimization. We can use the EXPLAIN <query> for that.

Putting it to the test

First I added 3M Vrijwilligers in the Person table

DROP TABLE Person;
CREATE TABLE Person AS
SELECT CONCAT('Vrijwilliger_', seq) AS Person, CONCAT('Vrijwilliger_', seq) as Vrijwilliger
FROM seq_1_to_3000000;

INSERT INTO `ProjectAdministration`.`AanmeldingGG` (`AanmeldingGG`, `buddy`) VALUES ('123', 'Vrijwilliger_1');

The explain output for the original query is image

Time to execute the actual query: 0.8s + fetch 1.7 s

The explain output of the adapted query is image

Time to execute the actual query: 0.016 + fetch 1.6 s

Ok.. that's a difference. But not really that much, because the fetching of data and presenting takes significantly longer than the actual query.

Indexing

When adding indexes (which we do by default), but now also to my manually created table:

CREATE INDEX "Person_Person" ON "Person" ("Person");
CREATE INDEX "Person_Vrijwilliger" ON "Person" ("Vrijwilliger");

Query times are: The original query: 0.15 s + fetch 1.6 s Adapted query: 0.016 s + fetch 1.6 s

No difference anymore.

As you can see, the query explain for the original query is now slightly different, taking into account the index image

Michiel-s commented 2 years ago

I say, let's close this issue.