Open jba opened 1 year ago
Maybe we should be more strict about struct name matching to prevent misspelling errors. Unfortunately that interferes with the ability to use the same struct for different partial views of its table.
Otherwise, the column name is matched to the field name case-insensitively.
Maybe a snake_case match would be better as a default. When field names are composed of more words most db conventions use snake_case for column names. FieldName->field_name
Also most projects i have seen in use already use 'db' Key in struct tags instead of 'sql'. If this could use the same there would be less work to convert existing projects.
Edit: meant snake_case (camelCase columns would already match with the case-insentive match)
@doggedOwl db
is kind of generic.sql
is more clear. And unpacking columns to fields by a naming convention doesn't seem like the "go" way of doing things.
Case insensitivity might be problematic. If you have a database collation set to case sensitive, I believe SQL Server and MySQL will consider column names with two casings to be different columns. This is not all databases, and not all usages of the databases it affects. Also, you can already alias each field in your select statement, or you can use tags as an exact match. To be specific. I don't think trying to be clever about case insensitivity would be a good idea for databases that act different from each other.
@doggedOwl
db
is kind of generic.sql
is more clear. And unpacking columns to fields by a naming convention doesn't seem like the "go" way of doing things.
Another benefit of "sql" as the tag is matching the package it comes from. That seems both concise and intuitive.
Personally, I'd far prefer scanning into a struct to scan the columns into fields in order - or at least having the option to. I don't particularly like the mechanism of using struct tags to control unmarshaling anyways and generally prefer my exported types to be tag-free. This would also allow scanning unnamed columns into structs.
Even if the iterator is not part of the proposal, given that it's part of the rationale, I'd also mention that I find the semantics of returning an error as an iteration value a bit confusing. It is not enforced that iteration stops if an error is returned. Arguably, that's a good thing - it might be, that a specific row can't be scanned into a value, but for some use cases, it might still make sense to iterate over the rest of the query. On the other hand, the ambiguity will potentially have people write incorrect code, if they assume one semantic or the other.
I'd far prefer scanning into a struct to scan the columns into fields in order
@Merovius Isn't that rather error-prone? If you change the order of fields in the select, but forget in the struct (which might be defined a ways away), it'll break.
I don't particularly like the mechanism of using struct tags to control unmarshaling anyways and generally prefer my exported types to be tag-free. This would also allow scanning unnamed columns into structs.
Out of interest, how do you unmarshal JSON with this approach? Or do you unmarshal into a non-exported struct with tags, and then copy to an exported struct without tags?
In terms of the proposal itself, I really like it -- I've wanted this for many years (and have used sqlx too). Code like rows.Scan(&item.ID, &item.Name, &item.TimeCreated, ...)
gets tedious really fast (and is error-prone for the same ordering reason as above).
@jba For structs, will embedded fields be supported, like they are in sqlx? What will ScanRow
do in the face of ambiguous column names, for queries like SELECT 1 AS a, 2 AS a
or SELECT a.id, a.name, b.id, b.name FROM foos AS a JOIN foos AS b ON a.parent = b.id
?
@benhoyt I don't know if it's error-prone. I don't necessarily "buy" theoretical arguments about error-proneness. But AIUI, sqlx
works that way already, so maybe the maintainers or users of that package can provide some data.
As for how I unmarshal JSON without tags: I declare an UnmarshalJSON
method that unmarshals into an unexported struct type and then either converts it to the tagless one or copies over the fields I'm interested in. That's a bit of added boilerplate, but mostly because the encoding/json
package relies on struct tags to begin with - I'd prefer an API that allows you to pass a function doing what the struct tags currently do. How that might look is off-topic here, IMO.
If we're going to add ScanRow(dest any) error
, I would extend the proposal to also add ScanRows(dest any) error
. If we can scan one row, it's not much more effort to scan all of them, appending to the provided slice, which would reduce a lot of boilerplate. The corollary in sqlx is Select
, where ScanRow
maps to Get
, and just like encoding/json
can Unmarshal
a single object or an array of objects, this follows the same pattern.
@Merovius Unless they are doing fine-grained memory optimization, people think of structs as unordered sets of fields. As does the language: re-ordering fields is never a breaking change.
As for iterators, there doesn't seem to be any appetite for error handling over on #61405, so I think having error
as the second return type is the best we're going to get.
@benhoyt Embedded fields will be supported. The implementation will use reflect.VisibleFields.
Personally, I think ScanRow should fail if there are ambiguous column names, but that's open for discussion.
@jba Not to derail this, but: Re-ordering fields is (sometimes) a breaking change. It affects convertibility. And FWIW, the language also considers order for unkeyed literals (which the Go 1 compatibility promise makes an exception for) and comparisons.
@derekperkins I would implement ScanRows
using more general tools:
func CollectErr[T any](it Iter2[T, error]) ([]T, error) {
var ts []T
for t, err := range it {
if err != nil {
return nil, err
}
ts = append(ts, t)
}
return ts, nil
}
// ...
products, err := CollectErr(sql.Query[Product](ctx, db, "SELECT ..."))
@Merovius good point about re-ordering fields. I will have to revisit apidiff....
@jba I can't tell from your response, are you opposed to adding a ScanRows
option? I think the majority use case for querying is to iterate and scan all rows before continuing. If we're thinking about adding ScanRow
as a convenience function, adding ScanRows
or something like it so users don't even have to iterate manually seems like an easy win.
@derekperkins I am opposed because I think it's a one-liner given CollectErr
. We've been talking about slices.Collect
, so I'm hopeful that CollectErr
will end up in slices
, but even if it doesn't it's an obvious candidate for everyone's toolbox.
While I was unaware of slices.Collect
conversations, I maintain that ScanRows
would still be a worthwhile addition. There are multiple cases, like strings.Cut or strings.ReplaceAll, that just proxy other functions in the same pkg, but were added because that was the primary usage.
FWIW I don't think that if we have iterators, collecting all rows into a slice will be the primary use. I think it might be today, but mainly because slices are the most convenient way to return ordered collections.
@Merovius I'm aware of your disagreement due to our discussion in the removal of keys and values from the map package, but I think it bears referencing again with acknowledgment that you are "technically" correct: Iterators being better in many cases doesn't make them preferred. There are countless places in code in languages with iterators where people choose to use a list-like structure when performance doesn't matter simply because it's simpler for a lot of people to think about. For instance, people need to worry if an interrater is repeatable. They need to worry about it they are iterating in multiple places and actually should have dropped it into a list/slice already because they are now actually being less efficient, and since most computing these days (outside libraries) is written in boring IT departments writing code without a lot of load, I'm guessing performance isn't a big issue.
There are countless places in code in languages with iterators where people choose to use a list-like structure when performance doesn't matter
My thought exactly. For the vast majority of CRUD style / db to api transformations, there's not going to be an db rows iterator => json encoder => http.ResponseWriter chain, even if that might be the most efficient. That spans multiple logical tiers, and makes it harder to integrate with logging / error reporting toolchains. IMO, the general expectation is a reasonable number of bounded rows that will just get put into a slice, though obviously that's just my projection.
My experience in Python was that iterators were slower than using lists, at least for small lists. I don’t know if that experience would follow over in Go. It’s worth benchmarking to get a rough sense for it. It might also have different variability, which can be more important than pure performance per se.
Unassigned fields get their zero value.
Shouldn’t this be “are left alone”? If they already have a nonzero value, they should keep it, unless there are explicit values for them.
@willfaught That was deliberate. Leaving the existing state sounds error-prone to me. Especially since you may use the same struct for various queries that only select some of the fields.
What's a good use case for leaving the existing values alone?
Unassigned columns are dropped
Should this be configurable? Personally I would prefer this to error, because that probably means there was a typo in a field mapping. Your suggestion makes sense in the context of encoding/json
silently discarding fields by default, though there was DisallowUnknownFields support added in 1.10. While there is value in meeting some of those expectations, my experience is that often json mapping are for apis out of your control, so you don't want to break if a field is added, but a database call seems fundamentally different.
What's a good use case for leaving the existing values alone?
This might be another place for configuration, as both options have their place. Reasons I can think of to leave them alone:
rows.Scan
pattern, but I would hope that this could be an abstraction with minimal overheadIt's not clear from the json docs, but are all fields zeroed out during unmarshaling?
I like the high level proposal of being able to pull an entire row at a time. I like to buffer my results first that does something similar in https://pkg.go.dev/github.com/golang-sql/table . In the SQL interface I actually use day-to-day, I have a Get method https://pkg.go.dev/github.com/kardianos/rdb#Result.Get on it, but I again almost never work with an open result set, I always buffer the results first in https://pkg.go.dev/github.com/kardianos/rdb@v1.1.6/table .
For myself, I hate relying on any result ordering. I deal with queries that often return 10-30 columns, or more for wide table. To be blunt, I think any API that relies on ordering/index of columns is just waiting for hard to detect bugs to creep in.
The persistent problem in the current API is type conversion, especially in Scan. If we can into structs or other name matching, we have a name matching to do as well. I suggest that this proposal be augmented to include a way to inject, either at the database pool level (DB) and/or at the Rows level, some way to specify additional type conversions and name matching or value mapping method. Perhaps they could be the same API, so a custom function could be passed a list of column names, values, and an value pointer to map them to. If this was passed at the scan site, it could look like this:
qOpt := sql.Q{
SQL: "SELECT * FROM products",
Map: func(columns []string, values []any, v any) error {
// Map to value v.
return nil
},
}
for p, err := range sql.Query[Product](ctx, db, qOpt) {
if err != nil {...}
}
The map function could be standard, and could have a default implementation. We could also expose the default conversion routines or hooks into them. Note, that this implementation is not statefull and thus probably not very efficient, because the column names would need to be looked up each time, thus a better mapper interface would be needed.
NOTE: Why is type conversion so important? Often times a DB driver supports values of type X (date or decimal or specific UUID type). Your application uses type Y of the same concept, but different implementation. Without integrated conversions you are stuck with two types working everywhere, rather then dealing with in once. To be honest, this is also a problem with query parameters.
NOTE2: civil Date, DateTime, Decimal, and UUIDs are some common types that can be frustrating to map, at least for myself.
The persistent problem in the current API is type conversion, especially in Scan
To clarify, this is mainly an issue because you can't implement Scanner/Valuer on types you don't own, right? We definitely have that issue, and end up with a lot of repeated manual handling
For myself, I hate relying on any result ordering... I think any API that relies on ordering/index of columns is just waiting for hard to detect bugs to creep in.
Are you suggesting that the proposal drop support for arrays and slices?
The persistent problem in the current API is type conversion, especially in Scan... I suggest that this proposal be augmented to include a way to inject, either at the database pool level (DB) and/or at the Rows level, some way to specify additional type conversions and name matching or value mapping method... Type conversion should be addressed in the same breath as this.
It sounds like the problem exists throughout the API:
Why would we tie the solution to ScanRows
, which is the only function in this proposal, or to the Query iterator (which is not part of this proposal)? It seems like your idea of registering converters at the DB
level makes the most sense. I don't understand the problem well enough to propose what that would look like, but I think it can be a separate proposal, so it can work everywhere.
some way to specify additional ... name matching
Aren't the struct tags sufficient for that?
That was deliberate. Leaving the existing state sounds error-prone to me. Especially since you may use the same struct for various queries that only select some of the fields.
What's a good use case for leaving the existing values alone?
@jba My intuition was that it would work like encoding/json, which does leave unassigned fields alone.
What if you have a struct that you need to "fill" from two different queries, just like if you have a struct that you need to "fill" from two different JSON objects?
If you want to ensure that unassigned fields are zero, you can already do that by using a zero struct value.
@willfaught @jba we often use sqlx to insert a struct and then we scan the result (from RETURNING ...
) to get the id and other fields generated by the database into the same struct - obviously only possible because StructScan
doesn't touch fields that aren't in the result set.
@willfaught, @rowanseymour: I changed the proposal to ignore untouched fields instead of zeroing them.
@jba are there plans to expose this API somewhere like x/exp
before it ships in a Go release? I don't know database/sql
well enough to tell whether that's possible or a good idea, but I'd like to test this out in some projects without needing to wait. Not just for this to be accepted and ship in a Go release, but also for another Go release to come and go, since most projects support two Go releases.
Case insensitivity might be problematic... (https://github.com/golang/go/issues/61637#issuecomment-1656263390)
@PaluMacil, I changed the proposal to make field matching case-sensitive.
If anyone disagrees, please address the arguments in the above comment.
@mvdan, we obviously can't add a ScanRow method, but it should be possible to write it as a function. I'll give that a shot. Stay tuned.
@PaluMacil I would like to understand the case-sensitivity concern better. Can you elaborate on the problematic scenario you envision here?
IIUC, you're worried that a database will have two columns with names that differ only in their case, and then this new method won't know how to do the scan?
The way the json
package does this is "preferring an exact match but also accepting a case-insensitive match". Could we do the same here? In (hopefully rare) cases where tables do have columns with names that only differ by case, struct tags can fix things up. ISTM that the much more common case is unique column names, and since there's a natural discrepancy between exported field names in Go and how table columns are named, this can save a lot of tedious struct fields just for the sake of amending the case.
@PaluMacil I would like to understand the case-sensitivity concern better. Can you elaborate on the problematic scenario you envision here?
IIUC, you're worried that a database will have two columns with names that differ only in their case, and then this new method won't know how to do the scan?
The way the
json
package does this is "preferring an exact match but also accepting a case-insensitive match". Could we do the same here? In (hopefully rare) cases where tables do have columns with names that only differ by case, struct tags can fix things up. ISTM that the much more common case is unique column names, and since there's a natural discrepancy between exported field names in Go and how table columns are named, this can save a lot of tedious struct fields just for the sake of amending the case.
If that's consistent with serialization for JSON, then I think that seems like a great approach. I would expect this to be an extremely uncommon issue
preferring an exact match but also accepting a case-insensitive match
I think the only reason for the preference would be if there were two fields or columns that differed only by case. That seems wildly unlikely for DB columns. I would prefer being case-insensitive by default, and requiring an exact match for names specified with a field tag.
I've been thinking about adding options to this API, as @derekperkins suggested above.
In go 1.10, the json.Decoder
type got the DisallowUnknownFields. That grew out of a desire for stricter JSON parsing. It was possible to make that change because adding a method to a concrete type is backwards compatible. There is no such opportunity for this proposal; to add configuration, we'd have to add a new method (ScanRowOpt
?).
So let's anticipate that by changing the signature to
func ScanRow(dest any, opts *ScanRowOptions) error
At this point everyone says "I hate passing nil
, let's use functional options," but:
database/sql
package (see TxOptions) and with the standard library as a whole.nil
, you have a chance to learn about how you can call this function in a way that might be better for you. (And if you're a reader or reviewer, that nil
might pique your curiosity.) If the options are a variadic arg, you won't know about them unless you read the docs.The one option I'm thinking about would be the same as for encoding/json
: whether additional columns should be allowed or cause an error. If we followed that precedent then the default would be to allow them, but I think it's better to start strict, and let the user relax the semantics when they realize they need to. (As pointed out in the comment linked above, the most likely bug is misspelling a field name.) Thus:
type ScanRowOptions struct {
// If true, do not return an error if the result set has a column name that does not match
// any field in the struct, or if it has more columns than slice elements.
AllowUnusedColumns bool
}
We could also add an option for unmatched struct fields. But I'm less sure about that because we've seen that there are many valid uses for having additional fields, and if the field is unmatched because of a typo, ScanRow
will fail by default because there will also be an unused column.
I just pushed a change to my table package that allows querying into struct slices and converting table buffers into struct slices: https://github.com/golang-sql/table/blob/master/struct.go
Still a few TODOs.
One thing to note, it may be better to opt into disallowing unknown struct fields globally, and make it a practice, if you want to just ignore a few fields, to use a tag such as sql:"-"
. Name mapping can be a large source of bugs.
If the options are important than maybe introducing a RowScanner and attaching options to it would be better than specifing options on each call of ScanRow. I can't think of any scenario where the options would change from one ScanRow to the next while looping through all the rows so that extra parameter creates too much noise for the value it provides at that point. Notice that also in the json the options are placed in the Decoder not on every Decode method call.
That said I'm not convinced of this particular need at all. especially if the default is to disallow unknown fields it will force people into more occasions for errors in typing. Instead of "Select * " they will need to write a long list of exact field matches that is in fact more probable to contain errors.
it may be better to opt into disallowing unknown struct fields globally, and make it a practice, if you want to just ignore a few fields, to use a tag such as sql:"-"
I'd support setting this at the DB
/ Tx
/ Stmt
level
That said I'm not convinced of this particular need at all. especially if the default is to disallow unknown fields it will force people into more occasions for errors in typing. Instead of "Select * " they will need to write a long list of exact field matches that is in fact more probable to contain errors.
I don't think so. Think of the list of columns from the query and the struct as two distinct tables. When joining together, there may be unused column from the query, and unused fields in the struct. We don't actually care about unused columns from the query: If you query select ID, Name, Birthdate as BAB
and try to scan into a struct with the fields: ID, and DOB (see typo in query), then the DOB will still throw an error. In fact, this should be the default. If you scan into a struct with extra fields, I would go so far as to say, any struct that you want to ignore should have sql:"-"
set on the field tag, and there should be no query level option. The "Name" column is obvious to inspection that it isn't present. We don't need to report that. Thus, you can still run select * from Table;
if you want to, extra query columns are fine. Only extra struct columns are a problem.
Requiring struct fields not in the returned column set to be annotated, assumes that a struct is only fetched from a database in only one way and that's not how we personally have been doing it with sqlx
. Besides scanning the RETURNING
columns from an INSERTed struct into the same struct, we also sometimes want to refresh/fetch a limited set of fields on a struct from the database and leave other struct fields untouched. I definitely lean toward the simplicity of ignoring struct columns not included in the results, and that's consistent with json unmarshaling.
if you want to, extra query columns are fine. Only extra struct columns are a problem.
I feel exactly the opposite, and anything we can do to disincentivize SELECT *
is good. :) I don't mean that in an argumentative way, but as extra confirmation that we need to be able to configure behavior, as both behaviors are reasonable tradeoffs.
Disincetivising "Select *" if that is important to you can be done on your linter. No need to add Api complexity for that.
Sorry that my tongue in cheek remark didn't come across. This isn't about query patterns or whether you use SELECT *
, we're trying to help people get the data they think they're getting. The two scenarios are:
There's no obvious "correct" behavior in the absence of an exact match of fields to destinations. Some people will want an error for one or both cases, and some people will want to silently succeed, like the default json configuration. Today, both error in stdlib usage because you have to manually map fields to destinations, and only exact matches are allowed.
The real question is what is the right api design to enable those choices to be made by users, and what the default should be. Probably the most expected default is to choose whatever happens in sqlx
, if we expect users to switch, but that's up for debate.
I like the sound of defaults being consistent with sqlx
and not just because I'm lazy and want an easy migration path...
missing destination name foo in *struct
- I like this because there's no good argument for fetching more data from the database than you intend to use.I would say that what is needed is a relatively simple interface for iterating over column values for each row. In fact, Scan
does exactly that: it iterates over column values. A similar function can be easily devised taking a callback styled consumer as an argument which will receive each column value in order.
If direct iteration over raw column values was supported, advanced use cases will be trivial to implement in the third party libraries.
pgx library has an additional method on the Row
type, namely Values
(https://github.com/jackc/pgx/blob/b301530a5fd50fe8ee46149fec58162a52bed8c6/rows.go#L286). This is somewhat better than Scan
for ORM style wrappers, but again, directly exposing nextColumn
would make even more sense and lead to better performance.
In fact, Scan does exactly that: it iterates over column values.
I think if users have advanced needs, they should just use Scan.
Scan
is very inconvenient both performance and coding wise, that's why we have this proposal and a whole bunch of ORM libraries doing a lot of heavy lifting around reflect
.
On modern cloud, databases and network are very fast. But on the Go side, the driver and wrapers take their sweet time doing a dozen allocations to simply forward a "select" result set to user. :-)
For a "server" language, Go SQL facilities are surprisingly limited.
I would say that what is needed is a relatively simple interface for iterating over column values for each row. In fact,
Scan
does exactly that: it iterates over column values. A similar function can be easily devised taking a callback styled consumer as an argument which will receive each column value in order.If direct iteration over raw column values was supported, advanced use cases will be trivial to implement in the third party libraries.
pgx library has an additional method on the
Row
type, namelyValues
(https://github.com/jackc/pgx/blob/b301530a5fd50fe8ee46149fec58162a52bed8c6/rows.go#L286). This is somewhat better thanScan
for ORM style wrappers, but again, directly exposingnextColumn
would make even more sense and lead to better performance.
This I fully agree with. In order to get values out of Scan today, I have to do this: https://github.com/golang-sql/table/blob/master/tablebuffer.go#L170
I do agree that the sql
still isn't great. This would be a separate proposal, it is only related in that it could make it more efficient to get data out of the scan. I used Scan at first, but it quickly became... unacceptable on a number of fronts. I don't like the Scan style API. If such a Rows.Values proposal was made, my only question would be if it would return:
Either way, a []any slice can be obtained from the Scan (see link), so this would be an optimization.
Edited: struct field names are matched to columns case-sensitively. Edited: untouched storage is ignored rather than zeroed.
I propose adding the method
to the
Row
andRows
types.ScanRow
attempts to populatedest
with all the columns of the current row.dest
can be a pointer to an array, slice, map or struct.Motivation
ScanRow
makes it more convenient to populate a row from a struct. Evidence that this is a desirable feature comes from the github.com/jmoiron/sqlx package, which has 9,800 importers, about 1,000 forks and about 14,000 GitHub stars. (To be fair,sqlx
provides several other enhancements todatabase/sql
as well.) TheScanRow
method bringsdatabase/sql
closer to parity withencoding/json
, whose ability to unmarshal into structs and other data types is widely used.Another motivation comes from the likely addition of iterators to the language. Without something like
ScanRow
, an iterator over a DB query would have to return aRow
orRows
, since theScan
method is variadic. An iterator like that still improves on using aRows
directly because it makes error-handling more explicit and always callsClose
. But we could do better withScanRow
, because the iterator could deliver a single value holding the entire row:This proposal doesn't include that iterator; I show it merely for motivation.
Details
ScanRow
acts as if each part of its argument were passed toScan
.If the value is an array pointer, successive array elements are scanned from the corresponding columns. Excess columns are dropped. Excess array elements are left untouched.
If the value is a slice pointer, the slice is resized to the number of columns and slice elements are scanned from the corresponding columns.
If the value is a map pointer, the underlying key type must be
string
. For each column, a map entry is created whose key is the column name and whose value is scanned from the column. Unnamed columns are assigned unique keys of the form_%d
for integers 0, 1, .... Other entries in the map are left untouched.If the value is a struct pointer, its exported visible fields are matched by name to the column names and the field values are scanned from the corresponding columns. The name matching is done as follows:
Unassigned columns are dropped. Unassigned fields are left untouched.