dyedgreen / deno-sqlite

Deno SQLite module
https://deno.land/x/sqlite
MIT License
408 stars 36 forks source link

return rows as [column: value] objects rather than value arrays #140

Closed andykais closed 3 years ago

andykais commented 3 years ago

Unsure if this should be a part of #138 (apologies for suggesting so many changes 😅) but it would probably be very useful to return rows as key value pairs of column names and values, rather than what they are now, which is an array of values. E.g.

const row = person_select.one()
console.log(row.name)
console.log(row.age)
dyedgreen commented 3 years ago

Yeah, this is a feature I’d like to support, but I’m unsure what’s the best api for it. Returning rows as a tuple of values will definitely be the default (it’s more aligned with SQL, and it’s slightly more efficient to do).

The two options I see for supporting this would be either:

I think the second option would be better, although there is then a question for what to do with the generic type (ie do we just have a second generic type for the returned objects, and how would that work best …)

On a separate note, both of these options are possible as incremental, non-breaking changes, so they will probably happen in a separate PR from #138 😅

dyedgreen commented 3 years ago

What do you think of this approach for generics?

type Row = Array<unknown>;
type RowObject = Record<string, unknown>;

class DB {
  query<R = Row>(sql: string, params: QueryParameterSet): Array<R>;
  queryObj<O = RowObject>(sql: string, params: QueryParameterSet): Array<O>;
  prepareQuery<R = Row, O = RowObject>(sql: string): PreparedQuery<R, O>;
}
andykais commented 3 years ago

I guess I dont know the scenario that arrays would be preferred over objects. Maybe for performance considerations? If you wanted to keep both I would suggest making objects the default, and put arrays behind a more descriptive method. E.g.

type RowValues = Array<unknown>;
type RowKeyValues = Record<string, unknown>;

class DB {
  query<R = RowKeyValues>(sql: string, params: QueryParameterSet): Array<R>;
  queryValues<V = RowValues>(sql: string, params: QueryParameterSet): Array<V>;
  prepareQuery<V = RowValues, R = RowKeyValues>(sql: string): PreparedQuery<V, R>;
}
dyedgreen commented 3 years ago

I feel quite strongly that addressing the columns by position should be the default for the API. Not only is it closer to how SQL itself works, where position is the key way to distinguish elements, it's also (slightly) faster and avoids a bunch of edge-cases.

Consider e.g. the following:

import { DB } from "./mod.ts";

// prepare database

const db = new DB();
db.query(`
  CREATE TABLE cities (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL
  )
`);
db.query(`
  CREATE TABLE people (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL,
    city INTEGER NOT NULL REFERENCES cities(id)
  )
`);

const addCityQuery = db.prepareQuery(
  "INSERT INTO cities (id, name) VALUES (:id, :name)",
);
const cities = ["New York", "London", "Hong Kong"];
for (const [id, name] of cities.entries()) {
  addCityQuery.execute({ id, name });
}
addCityQuery.finalize();

db.query("INSERT INTO people (name, city) VALUES (:name, :city)", {
  name: "Tilman",
  city: cities.indexOf("London"),
});

// run a query

const peopleQuery = db.prepareQuery<[number, string, string]>(`
  SELECT people.id, people.name, cities.name FROM people JOIN cities ON city = cities.id
`);

console.log(peopleQuery.all());
console.log(peopleQuery.columns());
[ [ 1, "Tilman", "London" ] ]
[
  { name: "id", originName: "id", tableName: "people" },
  { name: "name", originName: "name", tableName: "people" },
  { name: "name", originName: "name", tableName: "cities" }
]

What should the object returned be in this case? Should the second name overwrite the first one? Or should the names be name-spaced by table in this case, making them people.name? Should the queried values be name-spaced by origin table in general, unless there is only one table present? What about aggregated or computed fields like COUNT(*), their row is named 'COUNT(*)'? What about renamed fields, do we use the renamed name or the original name?

I agree that there are a lot of cases where returning objects is nice and does not have those problems. But there are a lot of subtleties around returning the rows as objects, and a lot of decisions which are not immediately obvious/ don't have a definitive answer.

Representing rows as arrays avoids all of these subtle issues. Having the option to return rows is nice to have, but arrays are the better default since they don't introduce any special cases.

They are also very ergonomic to use, e.g.

const query = db.prepareQuery<[id, name]>("SELECT id, name FROM users");
for (const [id, name] of query.iter()) {
  // ...
}
andykais commented 3 years ago

What should the object returned be in this case? Should the second name overwrite the first one? Or should the names be name-spaced by table in this case, making them people.name?

I will say interestingly, better-sqlite3 takes the prior route, and overwrites both columns under name, which feels a bit backwards to me.

What about aggregated or computed fields like COUNT(*), their row is named 'COUNT(*)'

this is also their behavior

I think the answer to both though is to use renamed fields.

Imo, array values are not any more clear, unless you are looking at the data. The "key" being an index 1 or 2 doesnt actually hold as much context as a column name. Even in your insert examples, you chose to use sql's object named parameters rather than simple positional args. Imo its because the prior is much easier to read.

Personally, my big fear I imagine is this common-ish pitfall. You have some sql query that looks like:

const select_user = db.prepareQuery<[number, string, string, number]>("SELECT id, first_name, last_name, phone_number FROM users");

for (const [id, first_name, last_name, phone_number] of select_users.iter()) {
   console.log({ first_name, last_name })
}

this code works fine, but if you go back in later and refactor your sql to rearrange the selected order...(last name is now before first name)

const select_user = db.prepareQuery<[number, string, string, number]>("SELECT id, last_name, first_name, phone_number FROM users");

for (const [id, first_name, last_name, phone_number] of select_users.iter()) {
   console.log({ first_name, last_name })
}

then even updating the type signature doesn't prevent you from selecting the wrong data. This is a contrived example, but imagine you are using a row of data deep in your application, or over a client/server in different repos. Imagine another scenario where instead of enumerating the columns you just have a SELECT * statement. Its entirely possible you write a migration that alters the table and reorders the columns of a table without realizing that this breaks code dependent on that order.

anyways that is my spiel, if row objects are supported, then ultimately this doesn't affect me, since I can just use the row object method. I just wanted to share my opinion, which is that choosing value-only data should feel like saying "yes I know what Im doing" rather than the default "give me my data please"

dyedgreen commented 3 years ago

Yes, I think the argument around the type being easier to change with a record / object is the strongest argument in favor of that kind of API.

Ultimately, I still think that the 'default' API should be based on tuples, since that is the most predictable in terms of what the API will do in any given situation. But with the design where there are simply functions for both returning tuples and returning objects (i.e. all, and allObj, allKv, or allRec), there is not really a strong notion of default anyways. (We could even decide to suffix both versions of the methods, and e.g. have allTuples and allRecords or something.)

I think this should be a second PR, after #138, since it will be an incremental update to the API and the current PR is already huge. I also don't have a name for the methods which I'm very happy with, so that will probably be nice to discuss in a separate PR as well. (Although that being said, I would ideally want to land this before making a new release 😅)

dyedgreen commented 3 years ago

I’m still struggling with a good name for this API, what do you think of: kvXXX, eg kvAll or kvall (the latter following the APIs like fstat in the Deno built-ins).

// option one
kvQuery
kvIter
kvAll
kvOne

// option two
kvquery
kviter
kvall
kvone
andykais commented 3 years ago

I think if you plan to put objects behind separate names, one guiding principle should probably be putting the verbs first. The reasoning is that using the api probably looks like "I know I need a single row" before I know "I should use an array" or "I should use key values". I would suggest something more like

onekv
iterkv
allkv
querykv

// or
one(sqlParams, { kv: true })
all(sqlParams, { kv: true })
iter(sqlParams, { kv: true })
query(sqlParams, { kv: true })
dyedgreen commented 3 years ago

My initial instinct was also to use a suffix, but I found that a bunch of functions in the Deno namespace use a prefix, e.g. there is stat, fstat, and lstat.

The second approach would be much harder to type, so I think we should go with different functions for both cases (since return types are much more straight forward this way)

andykais commented 3 years ago

to be fair, I think deno is also following the conventions that linux has laid out https://linux.die.net/man/2/fstat so I wouldnt worry too much about this being a deno/javascript convention. Would it ever make sense to separate this out at the prepare level?

[edit] I suppose not, since query has to exist outside prepare anyways

dyedgreen commented 3 years ago

You could have two separate prepareQuery functions, but it doesn't make much sense there, since the prepared query can trivially do either / it would be a somewhat artificial restriction.

andykais commented 3 years ago

I do also think that "entry" is often synonymous with "key value". E.g. Object.entries or java's HashMap::entrySet. It might be slightly more readable to do:

oneEntry
allEntries
iterEntries
queryEntries

[edit] Ill admit the pluralization is kind of ugly

(also just another suggestion, query is sort of an alias for prepareQuery().all() isnt it? Do we need both?)

dyedgreen commented 3 years ago

I hadn't considered entry before. Actually I really like how oneEntry or allEntries reads. It's very clear I think. I'm not sure if the names could be terser though (but I think being clear trumps being terse).

Regarding query: Yes, it's basically just preparing a query and calling all on that. It's a convenience, if you only need to run a query once. It also makes sure to finalize the prepared query (even if an error occurs, which is an easy case to forget). Another nice thing about it, is that it means the upcoming version will be compatible with most existing users code, which makes upgrading easier for everyone.

dyedgreen commented 3 years ago

This is probably a bad idea and will make the api much more confusing; but what if the functions where simply named:

tuple / object
==========
all / entries
one / entry
iter / entryIter

(not sure how to fit query into this, perhaps entryQuery?)

Edit: Right after seeing it written down, I think oneEntry, allEntries, et al are superior 😅

andykais commented 3 years ago

this is awesome! Thanks @dyedgreen!

Melkaz commented 1 year ago

When doing:

const query = db.prepareQuery<[string], {firstName: string}, never>("SELECT first_name FROM users");
const results = query.allEntries();

My IDE expects the results object to be an array of {firstName: string}. Unfortunately, it seems we get {first_name: string} instead, as first_name is the name of the column on SQLite side.

Is it expected ? Is there a way to cleanly translate the column names ?

Thanks !

dyedgreen commented 1 year ago

You have to specify the column names to match your query @Melkaz eg in your case you either want to call the type first_name, or rename the column with SELECT first_name AS firstName FROM ….