Closed kirubasankars closed 7 years ago
I'm available to work on this. I would like to start contributing to Go.
More or less, what should the API look like for this?
Much of the work is figuring out the API. Got a proposal?
What if we had a method on DB
that was just responsible for returning metadata?
// Something more or less like this:
meta, err := db.QueryMeta("table_name", "column_name")
@juicemia That only works for table columns; the column in a query may be computed by the database engine, in which case there is no set of arguments to pass to your QueryMeta method that will return the correct type.
How about a new method on Rows type:
func (rs *Rows) Types() ([]???, error)
I am not sure what type it should return though. It is certainly database specifict. E.g. there are ~150 different types in postgresql: https://github.com/lib/pq/blob/master/oid/types.go
JDBC API defines ~40 standard types: https://docs.oracle.com/javase/8/docs/api/java/sql/Types.html
@juicemia, there is #7408 for table-level metadata
This introduces the concept of type information, useful for #12381 as well for input parameter types. The following type information would work: https://godoc.org/github.com/kardianos/rdb#Type
@kostya-sh @kardianos defines a custom Type
type in rdb. I'm interested in reading what other people have to say about it, but I guess that Types
could just return []Type
.
func (rs *Rows) Types() ([]Type, error)
I am not sure what type it should return though. It is certainly database specifict. E.g. there are ~150 different types in postgresql: https://github.com/lib/pq/blob/master/oid/types.go JDBC API defines ~40 standard types: https://docs.oracle.com/javase/8/docs/api/java/sql/Types.html
Type
lets driver implementors specify custom types.
That's actually not complete enough in my opinion. Also see https://godoc.org/github.com/kardianos/rdb#Schema and the Column def. There are other parameters that can be important to know. Is it nullable? Is it a primary key? Field length, decimal precision.
I do think it is important to allow drivers to define custom types. In my implementation there is a generic type as well as a specific type. For example: varchar, char, nvarchar, and text are all genetically "Text".
Maybe it should just be an interface that it's up to drivers to define?
SQLite shrinks the lowest common denominator set of types quite a bit. Postgres let's you define your own types. SQLite doesn't really care if you use the right type. Postgres does. Some of their types share some names but that's about it: they behave quite differently.
I'm not sure what methods it should have (other than String() string) though. Stuff to work with other types and values, presumably.
@kardianos good point. Then maybe it could just be
// ColumnMeta is analagous to rdb.Column
func (rs *Rows) ColumnMeta() ([]ColumnMeta, error)
@jimmyfrasche Please don't use SQLite as an example. If you are using SQLite, today's API is perfect. However every other RDBMS out there has a strong notion of type internally and in interfaces.
As for variety, I've done the survey. What you have in the rdb Types is like 99% of types used. There really isn't that much variety. go type string is either varchar, char, nvarchar, varchar2, or text (and varchar2 is an oracle specific type and can be ignored for our use, can be provided by driver). That's it. Numeric types are generally all tinyint, smallint int, bigint. There are custom types: SQL server allow .Net types to be defined, including table types, postgres allows arrays and other types. But... I think we can let drivers define those types.
So, yes, sqlite doesn't care about types, so the current API is perfect for it. However other databases care (for at least performance) if your param is a varchar(10) or a nvarchar(-1). If it isn't meaningful for the database system in question, ignore it. But that is mostly just sqlite where you would.
@kardianos Sqlite may not be the best example (don't get me wrong, I love it) but it's still an example of a database that needs to be supported.
MySql is another example of a database that's wishy-washy about types (this one I do not love).
However, my point was that the definition of type in SQL across databases isn't something that can be set in stone. Even standardized types can be different between databases, sometimes in subtle ways.
If you define a lot of types, even if most dbs support all of them, the drivers for the ones that don't are going to need to deal with handling the ones they don't support. If they have types not in that set, the driver and the user has to deal with it.
I don't see a lot of value in filling database/sql up with a lot of constants that may or may not apply and don't really mean a lot on their own without knowing what driver you're using.
If there's a Type interface that has nice semantics it can cover as many or as few types as a particular database provides and reflect the native semantics more accurately—even for databases that don't exist yet. This does push the implementation burden onto the provider of the driver, but it also affords them the ability to define their types to fit their underlying database better.
Ultimately, what matters is how users of the database/sql package are going to be using this information. Admittedly, I've only ever used it for debugging, so strings would suit me just fine.
Constants are cheaper but do not cover all the cases, so you still need a way to compare arbitrary types anyway since TypeCustom equaling TypeCustom isn't really the whole story. So there should probably be an Equal(t Type) method on Type and an AssignableTo(t Type). This would have to be provided by the driver even for common types.
Most users will only be dealing with one database at a time so it doesn't really matter if sqlite3.TypeText != pg.TypeText: you're probably going to have to handle that specially anyway if you're writing code that cares about such things.
I'm not convinced an interface is better than just strings: strings are much simpler and easier for a driver to implement. I'm pretty sure constants in database/sql aren't the way to go here, though. It makes the world seem more uniform than it is.
@jimmyfrasche Can you please point me to documentation explaining how My Sql is wishy washy about types?
Are you disagreeing that there are only so many distinct types or that there are many others then listed https://github.com/kardianos/rdb/blob/master/types.go#L53 ?
MySql does a lot of implicit conversions https://dev.mysql.com/doc/refman/5.7/en/type-conversion.html though knowing the type upfront could allow you to work around that, if you knew that might get you in trouble and you knew you were dealing with MySql.
Are you disagreeing that there are only so many distinct types or that there are many others then listed
Both. There are databases with fewer types. There are databases with more types. (Many with infinite types in the sense that you can define arbitrary composite types like arrays and records).
I'm also trying to point out that just because two types have the same name in two different databases it doesn't mean they behave the same or that all the same rules apply.
None of that is to say that the type information isn't useful: just that it's very dependent on what database you're using.
I'm thinking something like
//in driver package
type Type interface {
fmt.Stringer
ValueConverter
Equal(Type) bool
AssignableTo(Type) bool
//and so on
}
is going to be more useful since it can handle a lot of those database-specific nuances for you. An sqlite driver could just always return true for AssignableTo and a postgres driver could return false if Equal returns false (after checking a handful of special cases surrounding text types, iirc).
MySql does a lot of implicit conversions https://dev.mysql.com/doc/refman/5.7/en/type-conversion.html though knowing the type upfront could allow you to work around that, if you knew that might get you in trouble and you knew you were dealing with MySql.
This is how SQL works, not just in My SQL. In MS SQL Server, if you send up a nvarchar param to a varchar column in a where clause, the performance goes down significantly due to increased conversion overhead. Silly? Yes. But I've gotta work with that without a bunch of hacks.
I'm also trying to point out that just because two types have the same name in two different databases it doesn't mean they behave the same or that all the same rules apply.
To a degree. A type Text in MS SQL server has a distinct meaning from varchar(max), whereas in postgresql type Text is just an unbounded varchar w/o significant differences. But an interface you are suggesting can't capture that well, as I'm not just taking about assignability to another database type. I also want to specify the input parameters #12381 where I need a database type.
There are databases with fewer types. There are databases with more types.
Fewer types, some types aren't used, more types are defined in the driver itself. Despite the many dialects of SQL, data types do have a common base. And the differences a driver may define and reference. I don't see the problem.
I find value in knowing what the database thinks a type is for both result columns and for input parameters. You are saying for result columns, provide a string representation and an interface to provide many functions you might want to do with it. I think that falls down in the input parameter case.
There may be value in letting the driver expose a function like:
type TypeCompare interface{
Assignable(t1, t2 Type) bool
Generic(t Type) Type
...
}
That could have value. But I think a standard type list, along with a driver type list is a first step. Do a survey among common and uncommon RDMBS types. They all share the vast majority of types, there is some variation as in the MS text vs varchar, but in those cases knowing the exact differences as a programmer is required (If a stored proc takes text, don't give it varchar for example).
I see what you mean re #12381 though the driver could export the types for this use. That doesn't help if you're trying to remain database agnostic, though. But presumably you're passing this as a hint to the driver, so you're already not database agnostic. Maybe I don't understand the use case.
If a specific driver is defining its own types—which it would have to for databases that allow user defined types at the very least—you still need an interface. Then the Type constants in database/sql would have to implement that interface, so they could be used interchangeably with the driver's types. But then they couldn't have anything db specific, like equivalence, so none of the types in database/sql could ever contain information specific to a database, so the interface couldn't have things like AssignableTo. If you needed that information you'd need to encode that in the driver or a third location, so what's the point of having the type constants in database/sql? They'd just be a list of names that could mean different things. I do not see the point.
It would be nice for database/sql to have more features, but I'm wary of encoding too much information into it.
What is the use case for having these specific constants in database/sql? Like I said, I've only ever used it for debugging. I imagine it would be very handy for statement mappers and ORMs but I've never written such things.
As far as I can see there are 3 places where types could be useful:
I think it would help to discuss and understand requirements for each of these cases separately.
Before jumping to the API discussions I suggest we should understand the use case for the requested feature better.
@kirubasankars, can you please explain what you meant by "some level dynamic column value parsing"?
Has anyone needed this functionality before? What was the use case?
For the reference this is what JDBC provides: http://docs.oracle.com/javase/8/docs/api/java/sql/ResultSetMetaData.html
I can imagine only a few database drivers support gettting all this information though. For example this is what you can get out of ProstgreSQL: https://www.postgresql.org/docs/9.5/static/protocol-message-formats.html (search for RowDescription (B)
)
@jimmyfrasche Yes, this would not be database agnostic. That would need a different system layer to filter it through. It would just expose what the RDBMS SQL type is.
Use case for results where switching on go type doesn't work:
var sql = `
select
d.ID, d.Name, d.Mime,
fromp.FullName as FromFullName, forp.FullName as ForFullName
from
seq.Document d
join acc.Person fromPerson fromp on fromp.ID = d.FromPerson
left join acc.Person forPerson forp on forp.ID = d.ForPerson
where
1=1 and ...;
`
If you need to encode a information regarding ForFullName, you can't rely on it being there in the first row, in fact it may never be represented. This isn't a problem if you ad-hoc everything, but if you push more to app specific framework, it can be more problematic.
However, per Brad in comment https://github.com/golang/go/issues/12381#issuecomment-239589860 I think this is moot. So while Brad also said https://github.com/golang/go/issues/16652#issuecomment-239328934 , it would need to be without specific type information being represented in the sql pacakge.
@kostya-sh, Trying to create entire row sets as map[string]interface{}.
While executing that query, my code have no idea about how many column, their names and types
Basic types are automatically mapped to GO types, that's good. Custom type are not, (example PostgreSQL custom type)
driver's can get me those custom values by only as RawBytes
I was expecting this package should expose type and size information.
in my opinion, may be java handle this better, java sample click here
Also have a look at JDBC sql types
Something just occurred to me.
At the risk of sounding dumb: how would I go about adding a method on an interface type?
// This is illegal because Rows is an interface type
func (rs *Rows) Schema() Schema
Adding it to the Rows
interface breaks the API and thus the API tool fails. I added the old interface to except.txt but I'm not sure if that's such a good idea? Some help from more experienced contributors would be awesome.
EDIT: I'm only doing this to start playing around and getting familiar with the source. I don't feel like we've reached a consensus as to what should be done.
@juicemia *sql.Rows is a struct. As such it can be added to safely.
@kardianos wow how embarrassing. 😅 i was looking in driver and totally forgot about the rest of the package. 😶
So what's the direction we should take on the types then? I agree with @jimmyfrasche that we should just expose a Type
interface, and leave the implementation of the types to the database drivers.
Has discussion for this moved somewhere else, or has it just tapered off?
@juicemia It has not moved. However, it is now my explicit understanding that @bradfitz does not want to see SQL Database types exposed in this package. So even though it isn't closed, I put it into the "highly doubtful" range.
See https://github.com/golang/go/issues/12381#issuecomment-239589860
Also see: https://github.com/golang/go/issues/16673#issuecomment-239619667 for the scope of changes I hope to tackle for go1.8.
@kardianos, no, I'm fine with making type information available. What I was opposed to that in that comment is making users redundantly specify type information when inserting data, since an interface value already conveys type information.
I'd like to see an API proposal for this issue.
@bradfitz Oh, then I misunderstood. To be clear before, the type information would always be optional and if not present base the DB type from the Go type.
@bradfitz Proposal for result type information.
I've used the result types from queries in a number of places (C# ADO.Net, and go rdb). Whereas for query parameters where you want to specify in unusual cases the specific actual database type (varchar vs nvarchar vs text), the results I have typically used the generic type (text, date, date+time of day, decimal, floating point, integer, array, structured (JSON, XML)).
I can think of three ways to specify this generic type:
database/sql
, in Schema.Column define a GenericType.While I view the generic type as most important for results, I would also like to allow fetching the database specific type. I could see this as either a type const defined in the driver or in database/sql
.
Are there any of these options that you would not want to see in the database/sql
package? Or do you have other opinions? I can send in a more specific proposal after I get that bit of feedback. Thanks.
Uh, I'll have a working proposal by Monday. Its working now but needs some cleanup. I'm the submitter of #7408 who just saw this activity after implementing something. There are some interesting nuances that I did not realize until doing this.
Sorry, not trying to be mysterious or anything. Just that I don't have time to explain it properly and clean it up properly today. :-)
And sorry for the 'sorry': Canadian.
@gnewton it's alright. I've been working on something as well based on some of the ideas in @kardianos rdb package. I just need to clean it up a bit.
OK. I think I'm taking a simpler path, so it will be useful to compare and see what the community thinks. :-)
@gnewton agreed. I've pushed my initial work to my fork. It can be found here.
@juicemia The diff can be found here: https://github.com/juicemia/go/commit/53d1cf564464f6d7f543126007f28a6804847023 It may be easier in the future to view these in a standard CL in gerrit. Just ensure you add DO NOT REVIEW on in it.
Oh okay. Yeah sorry I should have linked the commit.
After a chat with @kardianos, we've converged on a design like:
type ColumnInfo struct {
// unexported fields
// driver handle some
}
func (ci ColumnInfo) Name() string { ... }
func (ci ColumnInfo) ScanType() reflect.Type { ... } // something that's suitable for Scan
func (ci ColumnInfo) Nullable() (nullable, ok bool) { ... }
func (ci ColumnInfo) DriverTypeName() string { ... } // "CHAR" without (length)
func (rs *Rows) ColumnInfo() ([]ColumnInfo, error) {
}
Are we missing column size from the driver?
Seconding @kirubasankars on column size.
Also DriverTypeName
... does this mean driver-specific strings? It would be highly convenient if there were a standardized mapping of the most common types to constants a la JDBC etc. Otherwise I'll end up doing something like stdType := allDrivers['postgres'][driverTypeName]
(which is basically what I do today). Not the end of the world, but could be confusing for newcomers.
@kardianos @bradfitz @kirubasankars
I hope the ship hasn't sailed on the column size decision. I've been working with this very issue literally two minutes ago on my own project, decided I better drop in on this ticket again.
I'll give you a specific use-case where knowing the column size is important (and this is only just my particular case from tonight, this is a common occurrence with ETL software). I'm working on an app that moves data between heterogenous datasources (where the schemata are not known ahead of time, etc). Sometimes this means that there's a scratch DB in the middle where data munging/staging goes on. Let's say I've got oracle1.tblUser
and postgres1.tblAddress
, and there's a bunch of text fields on each, let's say email VARCHAR(64)
and city VARCHAR(255)
. Right now, how I make this work is that I have a thin layer wrapping each and every SQL driver so that I can get access to the field type/size etc so that I can present it in a uniform manner to the worker (with go-mysql
I made my own fork because for some inexplicable reason the driver doesn't export the field info).
Now, under the proposal above, yes I would know that a column is a text-ish field, but without knowing the size the best one can do is create a LONGTEXT
col or such, even though the drivers have access to the fact that those text cols are of max size 64
and 255
. If you're dealing with throughput of billions of rows, the performance difference between a VARCHAR(64)
/ VARCHAR(255)
and a LONGTEXT
does become significant (and in some databases, egregiously so).
Again, the col size is data that is ordinarily available to the driver. If for some reason the driver can't present that information, we could use -1
as a sentinel to indicate it's not available. Hell, you can make that the default in the sql
layer. But please, at least make it possible for drivers to provide this information back up through the API layer to the caller. There's no downside to it.
One more thing: should we not also include a method as follows?
func (ci ColumnInfo) Alias() string { ... }
This can default to ci.Name()
.
OK, maybe just one more thing... let's not forget about:
func (ci ColumnInfo) IsPrimaryKey() bool { ... }
Although, that's not ideal either? I'd be more inclined to go this path:
func (ci ColumnInfo) Attrs() Attr
// or Attributes() to be verbose
// if ci.Attrs() & AttrPrimaryKey ... or AttrNullable, AttrIndex etc.
This would be more extensible/future-proof; implementations could choose how much (if any) detail to send back in the attributes, and would allow us to drop the ci.Nullable()
method, which is really just a special case of col attributes.
@neilotoole I agree we need length, and possibly Scale and Precision. I'm not familiar with a protocol that exposes Alias as a separate property then Name. It is important to know that the design is chosen to allow adding additional method if the need arises.
CL https://golang.org/cl/29961 mentions this issue.
@kardianos I think you're probably correct about alias
. But yes, length/size
and scale/precision
would be useful.
@neilotoole Size and length are included in the CL. Please review if you want to.
@kardianos thanks, I just reviewed patch set 5.
@kardianos @bradfitz
I had some time this week to look over the proposal. I'm generally ok with the proposed sql
pkg API, but I think the driver
pkg changes are going off in the wrong direction. I'm referencing driver.go
PS7 (Gerrit).
TLDR: Replace the five new driver
pkg interfaces with one ColumnTyper
interface.
It seems to me that having these five separate interfaces is overly elaborate. This is what is currently proposed (comments omitted):
package driver
type RowsColumnTypeScanType interface {
Rows
ColumnTypeScanType(index int) reflect.Type
}
type RowsColumnTypeDatabaseTypeName interface {
Rows
ColumnTypeDatabaseTypeName(index int) string
}
type RowsColumnTypeLength interface {
Rows
ColumnTypeLength(index int) (length int64, ok bool)
}
type RowsColumnTypeNullable interface {
Rows
ColumnTypeNullable(index int) (nullable, ok bool)
}
type RowsColumnTypePrecisionScale interface {
Rows
ColumnTypePrecisionScale(index int) (precision, scale int64, ok bool)
}
And this from the sql
package:
package sql
func (rs *Rows) ColumnTypes() ([]ColumnType, error) {
if rs.isClosed() {
return nil, errors.New("sql: Rows are closed")
}
if rs.rowsi == nil {
return nil, errors.New("sql: no Rows available")
}
return rowsColumnInfoSetup(rs.rowsi), nil
}
func rowsColumnInfoSetup(rowsi driver.Rows) []ColumnType {
names := rowsi.Columns()
list := make([]ColumnType, len(names))
for i := range list {
ci := &list[i]
ci.name = names[i]
if prop, ok := rowsi.(driver.RowsColumnTypeScanType); ok {
ci.scanType = prop.ColumnTypeScanType(i)
} else {
ci.scanType = reflect.TypeOf(new(interface{})).Elem()
}
if prop, ok := rowsi.(driver.RowsColumnTypeDatabaseTypeName); ok {
ci.databaseType = prop.ColumnTypeDatabaseTypeName(i)
}
if prop, ok := rowsi.(driver.RowsColumnTypeLength); ok {
ci.length, ci.hasLength = prop.ColumnTypeLength(i)
}
if prop, ok := rowsi.(driver.RowsColumnTypeNullable); ok {
ci.nullable, ci.hasNullable = prop.ColumnTypeNullable(i)
}
if prop, ok := rowsi.(driver.RowsColumnTypePrecisionScale); ok {
ci.precision, ci.scale, ci.hasPrecisionScale = prop.ColumnTypePrecisionScale(i)
}
}
return list
}
I suggest replacing those five driver
interfaces with one interface:
package driver
type ColumnTyper interface {
Rows
ColumnTypes() ([]ColumnType, error)
}
The issues I see with this current proposal include:
sql
and driver
packages (as exemplified by the Rows.Columns()
methods).sql.Rows.ColumnTypes()
results in 5
x [num cols]
calls to the driver impl.
That's potentially a great deal of calls (at Teradata I saw customer queries with hundreds of cols). Even if in practice this is not a huge perf burden, something about it seems wrong (an "API smell" if you will).If we were to go with the ColumnTyper
approach: for safety, I (manually) verified that there is no exported ColumnTypes
field or method on the struct implementing driver.Rows
in any of the known driver packages, so this should be a non-breaking change.
List of drivers is from golang wiki. Each link is to the source for the struct that implements driver.Rows
:
For my own private project, I took almost this exact approach when hacking the sql
API. My changes looked something like this (omitting the ColumnType
type in both packages):
package driver
type ColumnTyper interface {
ColumnTypes() ([]ColumnType, error)
}
EDIT FOR CLARITY : this copy-n-edit snippet below (which exposes the driver package) is a poor example: driver.ColumnType
would need to be wrapped in sql.ColumnType
.
package sql
func (rs *Rows) ColumnTypes() ([]driver.ColumnType, error) {
if rs.closed {
return nil, errors.New("sql: Rows are closed")
}
if rs.rowsi == nil {
return nil, errors.New("sql: no Rows available")
}
if colTyper, ok := rs.rowsi.(driver.ColumnTyper); ok {
return colTyper.ColumnTypes()
}
return nil, errors.New("sql: driver does not provide column type data")
}
In my own project I have already implemented the ColumnTyper
API changes for Postgres (lib/pq
), MySQL (go-sql-driver
) and SQLite (mattn
), so I know from first-hand implementation experience that this approach works just fine. And it seems less complicated than the proposed approach.
Please expose original database column type at least as string. this is currently so restricting on this package. this will help us to do some level dynamic column value parsing