WiseLibs / better-sqlite3

The fastest and simplest library for SQLite3 in Node.js.
MIT License
5.22k stars 390 forks source link

treatment of default column aliases when none are specified #1176

Closed punkish closed 2 months ago

punkish commented 2 months ago

note: this is cross-posted here from the SQLite3 forums.

Consider what the SQLite CLI does when returning results with same column names from different tables but no aliases.

$ sqlite3
SQLite version 3.44.2 2023-11-24 11:41:44
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE TABLE one (id INTEGER PRIMARY KEY, a, b);
sqlite> CREATE TABLE two (id INTEGER PRIMARY KEY, c, d, one_id, a);
sqlite> INSERT INTO one (a, b) VALUES ('foo', 'bar'), ('baz', 'qux');
sqlite> INSERT INTO two (c, d, one_id, a) VALUES ('frob', 'blah', 1, 'foo'), ('rand', 'comt', 2, 'baz');
sqlite> SELECT id, a FROM one;
id  a  
--  ---
1   foo
2   baz
sqlite> SELECT c, d, id, a FROM two;
c     d     id  a  
----  ----  --  ---
frob  blah  1   foo
rand  comt  2   baz
sqlite> SELECT * FROM one JOIN two ON one.id = two.one_id;
id  a    b    id  c     d     one_id  a  
--  ---  ---  --  ----  ----  ------  ---
1   foo  bar  1   frob  blah  1       foo
2   baz  qux  2   rand  comt  2       baz
sqlite> SELECT id, a, b, id, c, d, one_id, a FROM one JOIN two ON one.id = two.one_id;
Parse error: ambiguous column name: id
  SELECT id, a, b, id, c, d, one_id, a FROM one JOIN two ON one.id = two.one_id;
         ^--- error here
sqlite> SELECT one.id, one.a, one.b, two.id, two.c, two.d, two.one_id, two.a FROM one JOIN two ON one.id = two.one_id;
id  a    b    id  c     d     one_id  a  
--  ---  ---  --  ----  ----  ------  ---
1   foo  bar  1   frob  blah  1       foo
2   baz  qux  2   rand  comt  2       baz

So, SQLite (the lib) is happy to output columns with the same default aliases, when none are provided, as shown in SELECT * FROM one JOIN two ON one.id = two.one_id but at least the SQLite CLI doesn't like it when ambiguous column names are provided in a query.

better-sqlite3, on the other hand, silently removes the column two.a from the result because the returning object is a JavaScript object, and an object can't have two keys with the same name.

function selects() {
    const res1 = db.prepare('SELECT id, a FROM one').all();
    console.log(' res1\n', '-'.repeat(50), '\n', res1, '\n');

    const res2 = db.prepare('SELECT c, d, id, a FROM two').all();
    console.log(' res2\n', '-'.repeat(50), '\n', res2, '\n');

    const res3 = db.prepare(`SELECT * FROM one JOIN two ON one.id = two.one_id`).all();
    console.log(' res3\n', '-'.repeat(50), '\n', res3, '\n');

    const res4 = db.prepare(`SELECT one.id, one.a, one.b, two.id, two.c, two.d, two.one_id, two.a FROM one JOIN two ON one.id = two.one_id`).all();
    console.log(' res4\n', '-'.repeat(50), '\n', res4, '\n');

    const res5 = db.prepare(`SELECT one.id AS one_id, one.a AS one_a, one.b AS one_b, two.id AS two_id, two.c AS two_c, two.d AS two_d, two.one_id AS two_one_id, two.a AS two_a FROM one JOIN two ON one.id = two.one_id`).all();
    console.log(' res5\n', '-'.repeat(50), '\n', res5, '\n');
}

 res1
 -------------------------------------------------- 
 [ { id: 1, a: 'foo' }, { id: 2, a: 'baz' } ] 

 res2
 -------------------------------------------------- 
 [
  { c: 'frob', d: 'blah', id: 1, a: 'foo' },
  { c: 'rand', d: 'comt', id: 2, a: 'baz' }
] 

 res3
 -------------------------------------------------- 
 [
  { id: 1, a: 'foo', b: 'bar', c: 'frob', d: 'blah', one_id: 1 },
  { id: 2, a: 'baz', b: 'qux', c: 'rand', d: 'comt', one_id: 2 }
] 

 res4
 -------------------------------------------------- 
 [
  { id: 1, a: 'foo', b: 'bar', c: 'frob', d: 'blah', one_id: 1 },
  { id: 2, a: 'baz', b: 'qux', c: 'rand', d: 'comt', one_id: 2 }
] 

 res5
 -------------------------------------------------- 
 [
  {
    one_id: 1,
    one_a: 'foo',
    one_b: 'bar',
    two_id: 1,
    two_c: 'frob',
    two_d: 'blah',
    two_one_id: 1,
    two_a: 'foo'
  },
  {
    one_id: 2,
    one_a: 'baz',
    one_b: 'qux',
    two_id: 2,
    two_c: 'rand',
    two_d: 'comt',
    two_one_id: 2,
    two_a: 'baz'
  }
] 

Now that I know this behavior, I can code around it, at least until either better-sqlite3 or SQLite change their behavior. Frankly, I don't know what the right solution should be. I am assuming SQLite behaves the way it does because, perhaps all SQL engines do the same thing. better-sqlite3, on the other hand, does the best with what it is provided -- a resulting data set that is modeled as an array (table) of objects (column-value pairs). But the objects can't have duplicate keys. Hence, we have a problem unless SQLite sends some other info in the result.

Of course, I am reporting this to the better-sqlite3 folks as well.

mceachen commented 2 months ago

Thanks for taking the time to describe this issue.

I’d strongly suggest altering your query to use unique column names, though. I suspect duplicate column or alias names in your SELECT may cause other non deterministic behaviors (as with any rdbms), including references in ORDER BY and GROUP BY clauses.

Prinzhorn commented 2 months ago

.expand(true)

Causes the prepared statement to return data namespaced by table. Each key in a row object will be a table name, and each corresponding value will be a nested object that contains the associated column data. This is useful when performing a JOIN between two tables that have overlapping column names. If a result column is an expression or subquery, it will be available within the special $ namespace.

Alternatively .raw(true) will also give you all data. Combine it with .columns()

at least until either better-sqlite3 or SQLite change their behavior.

What does that mean? You are asking the database for results with ambiguous names, so it's up to you to handle that properly. And better-sqlite3 offers all tools needed.

punkish commented 2 months ago

yea, you are correct. Now I know that better-sqlite3 does offer the tools I need to handle this curious situation. The only possible improvement would be if better-sqlite3 figured out that the result had ambiguous names and automagically used .expand(true). In any case, many thanks for the explanation.