JordanMarr / SqlHydra

SqlHydra is a suite of NuGet packages for working with databases in F# including code generation tools and query expressions.
MIT License
212 stars 21 forks source link

Joining more than once on the same table produces invalid SQL #32

Closed jwosty closed 1 year ago

jwosty commented 1 year ago

First of all - super awesome library!

Just running into a problem relating to joining on the same table more than once in a query. For example, given the following postgresql schema:

CREATE TABLE public.user
(
    user_pk uuid NOT NULL DEFAULT gen_random_uuid(),
    username character varying NOT NULL,
    PRIMARY KEY (user_pk)
);

CREATE TABLE public.message
(
    message_pk uuid NOT NULL DEFAULT gen_random_uuid(),
    from_fk uuid NOT NULL,
    to_fk uuid NOT NULL,
    body character varying NOT NULL,
    CONSTRAINT message__from_fk__fk FOREIGN KEY (from_fk)
        REFERENCES public."user" (user_pk) MATCH SIMPLE
        ON UPDATE CASCADE,
    CONSTRAINT message__to_fk__fk FOREIGN KEY (to_fk)
        REFERENCES public."user" (user_pk) MATCH SIMPLE
        ON UPDATE CASCADE
);

And the following code:

let userTable = table<``public``.message>
let messageTable = table<``public``.message>

let! messages =
    selectTask Dto.HydraReader.Read (Shared queryCtx) {
        for m in messageTable do
            join fromUser in userTable on (m.from_fk = fromUser.user_pk)
            join toUser in userTable on (m.to_fk = toUser.user_pk)
            select (m.body, fromUser.username, toUser.username)
    }

You end up with the following SQL error: table name "user" specified more than once

This is because it gets compiled to the following SQL:

"message"."body", "user"."username", "user"."username" FROM "message" 
INNER JOIN "user" ON ("message"."from_fk" = "user"."user_pk")
INNER JOIN "user" ON ("message"."to_fk" = "user"."user_pk")

SQL hydra can solve this by adding some AS clauses to disambiguate the two different usages of this table. It could either be an arbitrary alias, or it could even use the same name as the variable in the F# query (in this case fromUser and toUser).

It should be running something more like this:

"message"."body", "fromUser"."username", "toUser"."username" FROM "message" 
INNER JOIN "user" AS fromUser ON ("message"."from_fk" = "user"."user_pk")
INNER JOIN "user" AS toUser ON ("message"."to_fk" = "user"."user_pk")

Unfortunately I can't figure out a workaround, so I am having to drop to a lower level and generating a query directly.

JordanMarr commented 1 year ago

Thanks! Glad you are enjoying it.

This does seem like a pretty standard use case that I think SqlHydra should be able to handle gracefully within the SelectBuilder.

You may also be able to intercept the generated SqlKata.Query and manipulate it to add aliases to the tables before it is executed.

https://sqlkata.com/docs/from

jwosty commented 1 year ago

I've been playing around with this in a fork - I've gotten For, Select, and Join to work by just always using a table alias based on the variable names in the CE: https://github.com/jwosty/SqlHydra/tree/fix-multi-join-same-table

Some of my approach feels a little brittle, but it appears to be doable as a general concept.

JordanMarr commented 1 year ago

What would the syntax look like?

jwosty commented 1 year ago

What would the syntax look like?

No syntax nor SqlHydra API surface changes - my fork is just automatically aliasing everything based off the variable names already used. So for example, this:

selectTask Dto.HydraReader.Read (Shared queryCtx) {
    for foo in someTable do
        join bar in someOtherTable on (bar.col1 = foo.col1)
        select (foo.col1, foo.col2)
}

now automatically becomes:

SELECT foo.col1, foo.col2 FROM someTable AS foo
JOIN someOtherTable AS bar ON (foo.col1 = bar.col1)
JordanMarr commented 1 year ago

Using the foo and bar instance names for the alias is really cool. My only concern is that it adds a fair amount of complexity to the code. Given that there is already a lot of meta programming in this codebase, I'd really prefer to keep it simple whenever possible.

I think that your alternative idea of using an arbitrary alias would be fairly simple to implement in the SelectQuery by adding an indexer to the end of each table name.

jwosty commented 1 year ago

Using the foo and bar instance names for the alias is really cool. My only concern is that it adds a fair amount of complexity to the code. Given that there is already a lot of meta programming in this codebase, I'd really prefer to keep it simple whenever possible.

I think that your alternative idea of using an arbitrary alias would be fairly simple to implement in the SelectQuery by adding an indexer to the end of each table name.

Actually I think that might be the only way and the complexity is unavoidable for this feature, because either way, the builder has to be able to tell which alias table it is referring to.

As you know, the current version of the code determines the table name entirely from the variables type (i.e. it's a Map<Type,string>). So some mechanism has to be added to tell apart different usages of the same table, and I think the only thing that could tell you this is the variable name. Unfortunate, but I can't think of any other way!

JordanMarr commented 1 year ago

Hm... I see. In that case, I'm on board with necessary complexity.

JordanMarr commented 1 year ago

Go ahead and submit this as a PR and I will test it and get it merged.

jwosty commented 1 year ago

Go ahead and submit this as a PR and I will test it and get it merged.

Sweet, doing a draft PR. Still working on it; there's a lot to cover and I'm driving it out through a little web app I'm building

JordanMarr commented 1 year ago

I've been putting this one off for a while because it was so involved to finish. But I doubled down over the weekend, and I think it's pretty much done now. šŸ˜… I cherrypicked some key pieces from your PR (very well done, btw), but I still had to go through a lot of tedious implementation details. Specifically, the:

I don't know how you figured out the [<ReflectedDefinition>] thing; I would have never found that in a million years šŸ˜„, but that made the whole thing possible.

Integration tests are all passing, so only some query unit tests still need to be changed to reflect the new table alias qualified columns.

EDIT: All tests passing now! Only exception is that Oracle aggregate functions used in having and orderBy clauses no longer fail because Oracle. But I'm not willing to halt progress for one fringe feature on a provider that is 1/20th the downloads of the others.

I think I will add support for .NET 7 in the generators and then release (probably as v1.1.0).

JordanMarr commented 1 year ago

Released in v1.1.0! šŸŽ‰

jwosty commented 1 year ago

Released in v1.1.0! šŸŽ‰

Fantastic news. Thanks so much!