Closed andykais closed 3 years ago
Yeah, this is a feature I’d like to support, but I’m unsure what’s the best api for it. Returning rows as a tuple of values will definitely be the default (it’s more aligned with SQL, and it’s slightly more efficient to do).
The two options I see for supporting this would be either:
{ objects: true }
)iterObj
, allObj
and oneObj
; and a top level queryObj
on the DB)I think the second option would be better, although there is then a question for what to do with the generic type (ie do we just have a second generic type for the returned objects, and how would that work best …)
On a separate note, both of these options are possible as incremental, non-breaking changes, so they will probably happen in a separate PR from #138 😅
What do you think of this approach for generics?
type Row = Array<unknown>;
type RowObject = Record<string, unknown>;
class DB {
query<R = Row>(sql: string, params: QueryParameterSet): Array<R>;
queryObj<O = RowObject>(sql: string, params: QueryParameterSet): Array<O>;
prepareQuery<R = Row, O = RowObject>(sql: string): PreparedQuery<R, O>;
}
I guess I dont know the scenario that arrays would be preferred over objects. Maybe for performance considerations? If you wanted to keep both I would suggest making objects the default, and put arrays behind a more descriptive method. E.g.
type RowValues = Array<unknown>;
type RowKeyValues = Record<string, unknown>;
class DB {
query<R = RowKeyValues>(sql: string, params: QueryParameterSet): Array<R>;
queryValues<V = RowValues>(sql: string, params: QueryParameterSet): Array<V>;
prepareQuery<V = RowValues, R = RowKeyValues>(sql: string): PreparedQuery<V, R>;
}
I feel quite strongly that addressing the columns by position should be the default for the API. Not only is it closer to how SQL itself works, where position is the key way to distinguish elements, it's also (slightly) faster and avoids a bunch of edge-cases.
Consider e.g. the following:
import { DB } from "./mod.ts";
// prepare database
const db = new DB();
db.query(`
CREATE TABLE cities (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL
)
`);
db.query(`
CREATE TABLE people (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
city INTEGER NOT NULL REFERENCES cities(id)
)
`);
const addCityQuery = db.prepareQuery(
"INSERT INTO cities (id, name) VALUES (:id, :name)",
);
const cities = ["New York", "London", "Hong Kong"];
for (const [id, name] of cities.entries()) {
addCityQuery.execute({ id, name });
}
addCityQuery.finalize();
db.query("INSERT INTO people (name, city) VALUES (:name, :city)", {
name: "Tilman",
city: cities.indexOf("London"),
});
// run a query
const peopleQuery = db.prepareQuery<[number, string, string]>(`
SELECT people.id, people.name, cities.name FROM people JOIN cities ON city = cities.id
`);
console.log(peopleQuery.all());
console.log(peopleQuery.columns());
[ [ 1, "Tilman", "London" ] ]
[
{ name: "id", originName: "id", tableName: "people" },
{ name: "name", originName: "name", tableName: "people" },
{ name: "name", originName: "name", tableName: "cities" }
]
What should the object returned be in this case? Should the second name
overwrite the first one? Or should the names be name-spaced by table in this case, making them people.name
? Should the queried values be name-spaced by origin table in general, unless there is only one table present? What about aggregated or computed fields like COUNT(*)
, their row is named 'COUNT(*)'? What about renamed fields, do we use the renamed name or the original name?
I agree that there are a lot of cases where returning objects is nice and does not have those problems. But there are a lot of subtleties around returning the rows as objects, and a lot of decisions which are not immediately obvious/ don't have a definitive answer.
Representing rows as arrays avoids all of these subtle issues. Having the option to return rows is nice to have, but arrays are the better default since they don't introduce any special cases.
They are also very ergonomic to use, e.g.
const query = db.prepareQuery<[id, name]>("SELECT id, name FROM users");
for (const [id, name] of query.iter()) {
// ...
}
What should the object returned be in this case? Should the second name overwrite the first one? Or should the names be name-spaced by table in this case, making them people.name?
I will say interestingly, better-sqlite3 takes the prior route, and overwrites both columns under name
, which feels a bit backwards to me.
What about aggregated or computed fields like
COUNT(*)
, their row is named'COUNT(*)'
this is also their behavior
I think the answer to both though is to use renamed fields.
Imo, array values are not any more clear, unless you are looking at the data. The "key" being an index 1 or 2 doesnt actually hold as much context as a column name. Even in your insert examples, you chose to use sql's object named parameters rather than simple positional args. Imo its because the prior is much easier to read.
Personally, my big fear I imagine is this common-ish pitfall. You have some sql query that looks like:
const select_user = db.prepareQuery<[number, string, string, number]>("SELECT id, first_name, last_name, phone_number FROM users");
for (const [id, first_name, last_name, phone_number] of select_users.iter()) {
console.log({ first_name, last_name })
}
this code works fine, but if you go back in later and refactor your sql to rearrange the selected order...(last name is now before first name)
const select_user = db.prepareQuery<[number, string, string, number]>("SELECT id, last_name, first_name, phone_number FROM users");
for (const [id, first_name, last_name, phone_number] of select_users.iter()) {
console.log({ first_name, last_name })
}
then even updating the type signature doesn't prevent you from selecting the wrong data. This is a contrived example, but imagine you are using a row of data deep in your application, or over a client/server in different repos. Imagine another scenario where instead of enumerating the columns you just have a SELECT *
statement. Its entirely possible you write a migration that alters the table and reorders the columns of a table without realizing that this breaks code dependent on that order.
anyways that is my spiel, if row objects are supported, then ultimately this doesn't affect me, since I can just use the row object method. I just wanted to share my opinion, which is that choosing value-only data should feel like saying "yes I know what Im doing" rather than the default "give me my data please"
Yes, I think the argument around the type being easier to change with a record / object is the strongest argument in favor of that kind of API.
Ultimately, I still think that the 'default' API should be based on tuples, since that is the most predictable in terms of what the API will do in any given situation. But with the design where there are simply functions for both returning tuples and returning objects (i.e. all
, and allObj
, allKv
, or allRec
), there is not really a strong notion of default anyways. (We could even decide to suffix both versions of the methods, and e.g. have allTuples
and allRecords
or something.)
I think this should be a second PR, after #138, since it will be an incremental update to the API and the current PR is already huge. I also don't have a name for the methods which I'm very happy with, so that will probably be nice to discuss in a separate PR as well. (Although that being said, I would ideally want to land this before making a new release 😅)
I’m still struggling with a good name for this API, what do you think of: kvXXX
, eg kvAll
or kvall
(the latter following the APIs like fstat
in the Deno built-ins).
// option one
kvQuery
kvIter
kvAll
kvOne
// option two
kvquery
kviter
kvall
kvone
I think if you plan to put objects behind separate names, one guiding principle should probably be putting the verbs first. The reasoning is that using the api probably looks like "I know I need a single row" before I know "I should use an array" or "I should use key values". I would suggest something more like
onekv
iterkv
allkv
querykv
// or
one(sqlParams, { kv: true })
all(sqlParams, { kv: true })
iter(sqlParams, { kv: true })
query(sqlParams, { kv: true })
My initial instinct was also to use a suffix, but I found that a bunch of functions in the Deno
namespace use a prefix, e.g. there is stat
, fstat
, and lstat
.
The second approach would be much harder to type, so I think we should go with different functions for both cases (since return types are much more straight forward this way)
to be fair, I think deno is also following the conventions that linux has laid out https://linux.die.net/man/2/fstat so I wouldnt worry too much about this being a deno/javascript convention. Would it ever make sense to separate this out at the prepare
level?
[edit]
I suppose not, since query
has to exist outside prepare anyways
You could have two separate prepareQuery
functions, but it doesn't make much sense there, since the prepared query can trivially do either / it would be a somewhat artificial restriction.
I do also think that "entry" is often synonymous with "key value". E.g. Object.entries
or java's HashMap::entrySet
. It might be slightly more readable to do:
oneEntry
allEntries
iterEntries
queryEntries
[edit] Ill admit the pluralization is kind of ugly
(also just another suggestion, query
is sort of an alias for prepareQuery().all() isnt it? Do we need both?)
I hadn't considered entry before. Actually I really like how oneEntry
or allEntries
reads. It's very clear I think. I'm not sure if the names could be terser though (but I think being clear trumps being terse).
Regarding query
: Yes, it's basically just preparing a query and calling all
on that. It's a convenience, if you only need to run a query once. It also makes sure to finalize the prepared query (even if an error occurs, which is an easy case to forget). Another nice thing about it, is that it means the upcoming version will be compatible with most existing users code, which makes upgrading easier for everyone.
This is probably a bad idea and will make the api much more confusing; but what if the functions where simply named:
tuple / object
==========
all / entries
one / entry
iter / entryIter
(not sure how to fit query
into this, perhaps entryQuery
?)
Edit: Right after seeing it written down, I think oneEntry
, allEntries
, et al are superior 😅
this is awesome! Thanks @dyedgreen!
When doing:
const query = db.prepareQuery<[string], {firstName: string}, never>("SELECT first_name FROM users");
const results = query.allEntries();
My IDE expects the results
object to be an array of {firstName: string}
.
Unfortunately, it seems we get {first_name: string}
instead, as first_name
is the name of the column on SQLite side.
Is it expected ? Is there a way to cleanly translate the column names ?
Thanks !
You have to specify the column names to match your query @Melkaz eg in your case you either want to call the type first_name
, or rename the column with SELECT first_name AS firstName FROM …
.
Unsure if this should be a part of #138 (apologies for suggesting so many changes 😅) but it would probably be very useful to return rows as key value pairs of column names and values, rather than what they are now, which is an array of values. E.g.