PennyDreadfulMTG / Penny-Dreadful-Tools

A suite of tools for the Penny Dreadful MTGO community
https://pennydreadfulmagic.com
MIT License
40 stars 28 forks source link

Switch to ONLY_FULL_GROUP_BY for everything and add instructions to SETUP to turn on ONLY_FULL_GROUP_BY in mysql/mariadb #6995

Open vorpal-buildbot opened 4 years ago

vorpal-buildbot commented 4 years ago

Reported on Discord by bakert#2193

bakert commented 4 years ago

I guess before I do this I should make all SQL ONLY_FULL_GROUP_BY compliant.

bakert commented 4 years ago

SET GLOBAL sql_mode = CONCAT(@@sql_mode, ',ONLY_FULL_GROUP_BY');

bakert commented 4 years ago

Next up:

       SELECT c.id, c.layout, f.mana_cost, f.cmc, f.power, f.toughness, f.loyalty, f.type_line, f.oracle_text, f.hand, f.life, f.position, f.card_id, f.name AS face_name,
           pd_legal,
           legalities
       FROM
           card AS c
       INNER JOIN
           face AS f ON c.id = f.card_id
       LEFT JOIN (
           SELECT
               cl.card_id,
               SUM(CASE WHEN cl.format_id = 13 THEN 1 ELSE 0 END) > 0 AS pd_legal,
               GROUP_CONCAT(CONCAT(fo.name, ':', cl.legality)) AS legalities
           FROM
               card_legality AS cl
           LEFT JOIN
               format AS fo ON cl.format_id = fo.id
           GROUP BY
               cl.card_id
       ) AS cl ON cl.card_id = c.id
       GROUP BY
           f.id
       ORDER BY
           f.card_id, f.position
bakert commented 4 years ago

This is going to mean making the GROUP BYs of many statements a lot longer for dubious benefit. If we group by a.id does it really matter to group by a.foo and LOWER(a.bar) as well? It's obvious from a.id that we're getting a single row there. Maybe it is useful.

bakert commented 4 years ago

Turning on ONLY_FULL_GROUP_BY on local breaks a zillion queries including the nwdl join. So whatever happened to prevent #6919 from landing was not ONLY_FULL_GROUP_BY being turned on for travis, I don't think.

bakert commented 4 years ago

Because this is on for Travis CI this work will need to be done before /seasons/ can be added to smoke_test as a page to check for a 200.

bakert commented 2 years ago

First error, loading news for the homepage.

loading decks allows the passing in of a GROUP BY clause so it could be the site of a lot of these.

shared.pd_exception.DatabaseException: Failed to execute `
        SELECT

        d.id,
        d.finish,
        d.decklist_hash,
        cache.active_date,
        cache.wins,
        cache.losses,
        cache.draws,
        cache.color_sort,
        ct.name AS competition_type_name

        FROM
            deck AS d

        LEFT JOIN
            competition AS c ON d.competition_id = c.id
        LEFT JOIN
            competition_series AS cs ON cs.id = c.competition_series_id
        LEFT JOIN
            competition_type AS ct ON ct.id = cs.competition_type_id

        LEFT JOIN
            deck_cache AS cache ON d.id = cache.deck_id
        LEFT JOIN deck_cache AS season ON d.id = season.deck_id
        WHERE
            (d.finish = 1 AND d.created_date > 0 AND d.created_date <= 1637014603) AND (TRUE)
        GROUP BY 
            d.id,
            d.competition_id, -- Every deck has only one competition_id but if we want to use competition_id in the HAVING clause we need this.
            season.season_id -- In theory this is not necessary as all decks are in a single season and we join on the date but MySQL cannot work that out so give it the hint it needs.

        HAVING
            TRUE
        ORDER BY
            active_date DESC, d.finish IS NULL, d.finish
        LIMIT 10
    ` with `[]` because of `(1055, "'decksite.d.finish' isn't in GROUP BY")
bakert commented 2 years ago

ONLY_FULL_GROUP_BY is pretty dumb. If you GROUP BY a primary key it still wants you to add every column from the table with that primary key that is in your SELECT to the GROUP BY clause. The fix for the above query is to add:

            d.decklist_hash,
            cache.active_date,
            cache.wins,
            cache.losses,
            cache.draws,
            cache.color_sort,
            ct.name,

to the GROUP BY clause. All of which is derivably-single-valued from the JOINs and PRIMARY KEYs and FOREIGN KEYs the database supposedly knows about.

This is going to be a lot of hassle to add and while it definitely does find bugs it also adds a lot of verbosity. Not sure if we should proceed.

bakert commented 1 year ago

I just looked at this again. It still feels mighty verbose. In a query like the one below it's a ridiculous number of fields that need to be duplicated (or wrapped in any_value … even uglier!)

SELECT
    d.id,
    d.name AS original_name,
    d.created_date,
    d.updated_date,
    SUM(CASE WHEN dm.games > IFNULL(odm.games, 0) THEN 1 ELSE 0 END) AS wins,
    SUM(CASE WHEN dm.games < odm.games THEN 1 ELSE 0 END) AS losses,
    SUM(CASE WHEN dm.games = odm.games THEN 1 ELSE 0 END) AS draws,
    d.finish,
    d.archetype_id,
    d.url AS source_url,
    d.competition_id,
    c.name AS competition_name,
    c.end_date AS competition_end_date,
    c.top_n AS competition_top_n,
    ct.name AS competition_type_name,
    d.identifier,
    LOWER(IFNULL(IFNULL(IFNULL(p.name, p.mtgo_username), p.mtggoldfish_username), p.tappedout_username)) AS person,
    p.id AS person_id,
    p.banned,
    p.discord_id,
    d.decklist_hash,
    d.retired,
    d.reviewed,
    s.name AS source_name,
    IFNULL(a.name, '') AS archetype_name,
    cache.normalized_name AS name,
    cache.colors,
    cache.colored_symbols,
    cache.color_sort,
    cache.legal_formats,
    ROUND(cache.omw * 100, 2) AS omw,
    season.season_id,
    IFNULL(MAX(m.date), d.created_date) AS active_date,
    MAX(q.changed_date) AS last_archetype_change
FROM
    deck AS d
LEFT JOIN
    deck_archetype_change AS q ON q.deck_id = d.id
LEFT JOIN
    person AS p ON d.person_id = p.id
LEFT JOIN
    source AS s ON d.source_id = s.id
LEFT JOIN
    archetype AS a ON d.archetype_id = a.id
LEFT JOIN
    competition AS c ON d.competition_id = c.id
LEFT JOIN
    competition_series AS cs ON cs.id = c.competition_series_id
LEFT JOIN
    competition_type AS ct ON ct.id = cs.competition_type_id
LEFT JOIN
    deck_cache AS cache ON d.id = cache.deck_id
LEFT JOIN
    deck_match AS dm ON d.id = dm.deck_id
LEFT JOIN
    `match` AS m ON dm.match_id = m.id
LEFT JOIN
    deck_match AS odm ON odm.deck_id <> d.id AND dm.match_id = odm.match_id
LEFT JOIN 
    deck_cache AS season ON d.id = season.deck_id
WHERE
    (d.finish = 1 AND d.created_date > 0 AND d.created_date <= 1683086952) AND (TRUE)
GROUP BY
    d.id,
    d.competition_id, -- Every deck has only one competition_id but if we want to use competition_id in the HAVING clause we need this.
    season.season_id -- In theory this is not necessary as all decks are in a single season and we join on the date but MySQL cannot work that out so give it the hint it needs.
HAVING
    TRUE
ORDER BY
    active_date DESC, d.finish IS NULL, d.finish
LIMIT 10

All the advice I can find online says you should have it on, though. Might catch some bugs.

bakert commented 5 months ago

Apparently MySQL now has detection that a field is functionally dependent on a PK in the GROUP BY. The SQL99 (?) spec said you had to list all the columns. But the SQL 2003 (?) spec says you can leave out those that are functionally dependent.

But looks like mariadb does not have it yet – https://jira.mariadb.org/browse/MDEV-11588 – at least as of 11.3.2-MariaDB. We wait, I guess …

What's strange is that ONLY_FULL_GROUP_BY is on by default, now, so it must be driving people nuts?

bakert commented 5 months ago

When this no longer errors we are good to go:

MariaDB [decksite]> SET GLOBAL sql_mode = CONCAT(@@sql_mode, ',ONLY_FULL_GROUP_BY');
Query OK, 0 rows affected (0.001 sec)

MariaDB [decksite]> SELECT id, finish FROM deck GROUP BY id;
ERROR 1055 (42000): 'decksite.deck.finish' isn't in GROUP BY
bakert commented 4 months ago

MySQL has had this for more than ten years fwiw. https://dev.mysql.com/doc/refman/8.3/en/group-by-handling.html