fsprojects / SQLProvider

A general F# SQL database erasing type provider, supporting LINQ queries, schema exploration, individuals, CRUD operations and much more besides.
https://fsprojects.github.io/SQLProvider
Other
570 stars 144 forks source link

Simple inner join produces empty results for left side - expecting wrong names in generated SQL? #652

Closed rmunn closed 4 years ago

rmunn commented 4 years ago

Description

A simple query joining two tables produces empty results for the left side of the join, even though running the generated SQL works correctly at the MySQL command prompt.

Repro steps

I've trimmed this down as much as I can, but it's still going to be a bit long since there's quite a bit of information to include. Sorry about the length; it's as short as I can make it.

The data structure is a table of users, each of which can have multiple email addresses; one email address is set as the "default" address. Relevant subset of my test data (produced by mysqldump and trimmed down to as small as I can make it):

DROP TABLE IF EXISTS `users`;
CREATE TABLE `users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `login` varchar(255) NOT NULL DEFAULT '',
  `firstname` varchar(30) NOT NULL DEFAULT '',
  `lastname` varchar(255) NOT NULL DEFAULT '',
  `language` varchar(5) DEFAULT '',
  PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=129 DEFAULT CHARSET=utf8;
INSERT INTO `users` (`id`, `login`, `firstname`, `lastname`, `language`) VALUES
(2,'','Anonymous','User',NULL),
(94,'rhood','Robin','Hood','en');

DROP TABLE IF EXISTS `email_addresses`;
CREATE TABLE `email_addresses` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` int(11) NOT NULL,
  `address` varchar(255) NOT NULL,
  `is_default` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `index_email_addresses_on_user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=72 DEFAULT CHARSET=utf8;

INSERT INTO `email_addresses` (`id`, `user_id`, `address`, `is_default`) VALUES
(14, 94, 'robin_hood@example.org', 1),
(71, 94, 'rhood@example.com', 0);

I use SqlProvider as follows:

[<Literal>]
let sampleConnString = "Server=localhost;Database=testquery;User=rmunn"

[<Literal>]
let resolutionPath = __SOURCE_DIRECTORY__

type sql = SqlDataProvider<Common.DatabaseProviderTypes.MYSQL,
                           sampleConnString,
                           ResolutionPath = resolutionPath,
                           CaseSensitivityChange = Common.CaseSensitivityChange.ORIGINAL,
                           UseOptionTypes = true>

Here's the query that's failing:

let getUser (connString : string) username =
    async {
        let ctx = sql.GetDataContext connString
        let! userAndEmails =
            query {
                for user in ctx.Testquery.Users do
                where (user.Login = username)
                join mail in ctx.Testquery.EmailAddresses on (user.Id = mail.UserId)
                select ((user.Login, user.Firstname, user.Lastname, user.Language), ((if mail.IsDefault <> 0y then 1y else 2y), mail.Address))  // Sort default email(s) first, all others second
            } |> List.executeQueryAsync
        for pair in userAndEmails |> List.sortBy snd do
            let (login, firstname, lastname, language) = fst pair
            printfn "In getUser: found %s (%s %s) with language %A and email(s) %A" login firstname lastname language (snd pair)
        return userAndEmails
    }

While this one succeeds:

let getUser (connString : string) username =
    async {
        let ctx = sql.GetDataContext connString
        let! userAndEmails =
            query {
                for user in ctx.Testquery.Users do
                where (user.Login = username)
                // join mail in ctx.Testquery.EmailAddresses on (user.Id = mail.UserId)  // Causing blank user details for some reason??
                select ((user.Login, user.Firstname, user.Lastname, user.Language), (1y, "sample@example.com"))  // Sort default email(s) first, all others second
            } |> List.executeQueryAsync
        for pair in userAndEmails |> List.sortBy snd do
            let (login, firstname, lastname, language) = fst pair
            printfn "In getUser: found %s (%s %s) with language %A and email(s) %A" login firstname lastname language (snd pair)
        if userAndEmails |> List.isEmpty
        then return None
        else return userAndEmails |> Dto.UserDetails.FromSqlWithEmails |> Some
    }

I added FSharp.Data.Sql.Common.QueryEvents.SqlQueryEvent |> Event.add (printfn "Executing SQL: %O") to my initialization code so I could inspect the SQL being generated by SqlProvider. I noticed a difference between the two queries generated:

First, failing, query (including the join):

Executing SQL: SELECT `users`.`login` as '`users`.`login`',`users`.`firstname` as '`users`.`firstname`',`users`.`lastname` as '`users`.`lastname`',`users`.`language` as '`users`.`language`' FROM testquery.users as `users` INNER JOIN  `testquery`.`email_addresses` as `mail` on `users`.`id` = `mail`.`user_id`WHERE ((`users`.`login` = @param1)) -- params @param1 - "rhood"; 

This one ends up printing the following out of my for loop:

In getUser: found  ( ) with language <null> and email(s) (1y, "robin_hood@example.org")
In getUser: found  ( ) with language <null> and email(s) (2y, "rhood@example.com")

Second, successful, query (without any join):

Executing SQL: SELECT `user`.`login` as `login`,`user`.`firstname` as `firstname`,`user`.`lastname` as `lastname`,`user`.`language` as `language` FROM testquery.users as `user` WHERE ((`user`.`login` = @param1)) -- params @param1 - "rhood"; 

This one prints the following out of my for loop (of course, only one email address is found since I didn't do a join):

In getUser: found rhood (Robin Hood) with language Some "en" and email(s) (1y, "sample@example.com")

Note the difference between the names used in the SELECT statement. With the join, the statement is:

SELECT `users`.`login` as '`users`.`login`', (...), `mail`.`address` as '`mail`.`address`' FROM testquery.users as `users` INNER JOIN  `testquery`.`email_addresses` as `mail` WHERE ((`users`.`login` = @param1)) -- params @param1 - "rhood";

and the name is users.login (with some extra backtick characters). But when I do a query without the join, the statement is:

SELECT `user`.`login` as `login` (...) FROM testquery.users as `user` WHERE ((`user`.`login` = @param1)) -- params @param1 - "rhood"; 

and here the name of the column is just login, not users.login. I suspect that the different column names being selected by the generated SQL are responsible for whether or not SqlProvider succeeds in finding and extracting the data once the query returns.

Note that when I run the generated SQL query in MySQL, it works:

mysql> SELECT `users`.`login` as '`users`.`login`',`users`.`firstname` as '`users`.`firstname`',`users`.`lastname` as '`users`.`lastname`',`users`.`language` as '`users`.`language`',`mail`.`is_default` as '`mail`.`is_default`',`mail`.`address` as '`mail`.`address`' FROM testquery.users as `users` INNER JOIN  `testquery`.`email_addresses` as `mail` on `users`.`id` = `mail`.`user_id`WHERE ((`users`.`login` = 'rhood'));
+-----------------+---------------------+--------------------+--------------------+---------------------+------------------------+
| `users`.`login` | `users`.`firstname` | `users`.`lastname` | `users`.`language` | `mail`.`is_default` | `mail`.`address`       |
+-----------------+---------------------+--------------------+--------------------+---------------------+------------------------+
| rhood           | Robin               | Hood               | en                 |                   1 | robin_hood@example.org |
| rhood           | Robin               | Hood               | en                 |                   0 | rhood@example.com      |
+-----------------+---------------------+--------------------+--------------------+---------------------+------------------------+
2 rows in set (0.00 sec)

So the data is getting out of MySQL, it's just not making it all the way back to my F# code. Somewhere inside SqlProvider, the user data gets lost and only the emails make it out intact.

Expected behavior

Joining two tables returns data from both tables.

Actual behavior

Joining the users table to the email_addresses table ends up returning blank data from the users table, though the emails get returned correctly.

Known workarounds

I expect (though I haven't yet tried this) that I can work around this bug by doing two queries and avoiding join, which is rather inefficient. Thankfully my database is pretty small and my user base is limited, so the inefficiency won't kill me.

Related information

rmunn commented 4 years ago

I just found the !! operator in the documentation, which does what I actually want here (a left outer join). And adding the !! operator produces a SELECT statement with names like SELECT user.login instead of SELECT users.login, and that ends up returning the results correctly (the username, first & last name, etc, are not blank). Which goes some way to confirm that the root cause of the issue is the SELECT statement being produced with field names users.login, users.firstname, etc., but the results being parsed expecting field names user.login, user.firstname, etc. (singular vs plural).

So although I'm no longer blocked by this issue, I'll leave it open as I do think I've discovered an actual bug.

rmunn commented 4 years ago

Another data point: the order of the where and join clauses in my query expression makes a difference to the names chosen in the resulting SQL. My original query looked like:

query {
    for user in ctx.Testquery.Users do
    where (user.Login = username)
    join mail in !! ctx.Testquery.EmailAddresses on (user.Id = mail.UserId)
    select (user, mail)
}

That produces SELECT users.login AS 'users.login', ... (plural users), and I get blank data for the fields in the users table. But if I swap the order of the join and where clauses:

query {
    for user in ctx.Testquery.Users do
    join mail in !! ctx.Testquery.EmailAddresses on (user.Id = mail.UserId)
    where (user.Login = username)
    select (user, mail)
}

Then this produces SELECT user.login AS 'user.login', ... (singular user), and I get the correct (non-blank) data out of the users table.

So it wasn't actually the presence of the !! operator that was fixing it for me, it was the ordering of the join clauses vs the where clauses! Weird.

Thorium commented 4 years ago

Thanks for reporting this, it seems like the recent alias changes broke the where-before-join case and our unit-test assertions were not good enough to catch this. I'll fix...

Thorium commented 4 years ago

Try the NuGet version 1.1.74 please.

rmunn commented 4 years ago

Looks like 1.1.74 fixed it. Thanks!