golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.99k stars 17.54k forks source link

database/sql: expose database original column type #16652

Closed kirubasankars closed 7 years ago

kirubasankars commented 8 years ago

Please expose original database column type at least as string. this is currently so restricting on this package. this will help us to do some level dynamic column value parsing

juicemia commented 8 years ago

I'm available to work on this. I would like to start contributing to Go.

More or less, what should the API look like for this?

bradfitz commented 8 years ago

Much of the work is figuring out the API. Got a proposal?

juicemia commented 8 years ago

What if we had a method on DB that was just responsible for returning metadata?

// Something more or less like this:
meta, err := db.QueryMeta("table_name", "column_name")
quentinmit commented 8 years ago

@juicemia That only works for table columns; the column in a query may be computed by the database engine, in which case there is no set of arguments to pass to your QueryMeta method that will return the correct type.

kostya-sh commented 8 years ago

How about a new method on Rows type:

func (rs *Rows) Types() ([]???, error)

I am not sure what type it should return though. It is certainly database specifict. E.g. there are ~150 different types in postgresql: https://github.com/lib/pq/blob/master/oid/types.go

JDBC API defines ~40 standard types: https://docs.oracle.com/javase/8/docs/api/java/sql/Types.html

kostya-sh commented 8 years ago

@juicemia, there is #7408 for table-level metadata

kardianos commented 8 years ago

This introduces the concept of type information, useful for #12381 as well for input parameter types. The following type information would work: https://godoc.org/github.com/kardianos/rdb#Type

juicemia commented 8 years ago

@kostya-sh @kardianos defines a custom Type type in rdb. I'm interested in reading what other people have to say about it, but I guess that Types could just return []Type.

func (rs *Rows) Types() ([]Type, error)

I am not sure what type it should return though. It is certainly database specifict. E.g. there are ~150 different types in postgresql: https://github.com/lib/pq/blob/master/oid/types.go JDBC API defines ~40 standard types: https://docs.oracle.com/javase/8/docs/api/java/sql/Types.html

Type lets driver implementors specify custom types.

kardianos commented 8 years ago

That's actually not complete enough in my opinion. Also see https://godoc.org/github.com/kardianos/rdb#Schema and the Column def. There are other parameters that can be important to know. Is it nullable? Is it a primary key? Field length, decimal precision.

I do think it is important to allow drivers to define custom types. In my implementation there is a generic type as well as a specific type. For example: varchar, char, nvarchar, and text are all genetically "Text".

jimmyfrasche commented 8 years ago

Maybe it should just be an interface that it's up to drivers to define?

SQLite shrinks the lowest common denominator set of types quite a bit. Postgres let's you define your own types. SQLite doesn't really care if you use the right type. Postgres does. Some of their types share some names but that's about it: they behave quite differently.

I'm not sure what methods it should have (other than String() string) though. Stuff to work with other types and values, presumably.

juicemia commented 8 years ago

@kardianos good point. Then maybe it could just be

// ColumnMeta is analagous to rdb.Column
func (rs *Rows) ColumnMeta() ([]ColumnMeta, error)
kardianos commented 8 years ago

@jimmyfrasche Please don't use SQLite as an example. If you are using SQLite, today's API is perfect. However every other RDBMS out there has a strong notion of type internally and in interfaces.

As for variety, I've done the survey. What you have in the rdb Types is like 99% of types used. There really isn't that much variety. go type string is either varchar, char, nvarchar, varchar2, or text (and varchar2 is an oracle specific type and can be ignored for our use, can be provided by driver). That's it. Numeric types are generally all tinyint, smallint int, bigint. There are custom types: SQL server allow .Net types to be defined, including table types, postgres allows arrays and other types. But... I think we can let drivers define those types.

So, yes, sqlite doesn't care about types, so the current API is perfect for it. However other databases care (for at least performance) if your param is a varchar(10) or a nvarchar(-1). If it isn't meaningful for the database system in question, ignore it. But that is mostly just sqlite where you would.

jimmyfrasche commented 8 years ago

@kardianos Sqlite may not be the best example (don't get me wrong, I love it) but it's still an example of a database that needs to be supported.

MySql is another example of a database that's wishy-washy about types (this one I do not love).

However, my point was that the definition of type in SQL across databases isn't something that can be set in stone. Even standardized types can be different between databases, sometimes in subtle ways.

If you define a lot of types, even if most dbs support all of them, the drivers for the ones that don't are going to need to deal with handling the ones they don't support. If they have types not in that set, the driver and the user has to deal with it.

I don't see a lot of value in filling database/sql up with a lot of constants that may or may not apply and don't really mean a lot on their own without knowing what driver you're using.

If there's a Type interface that has nice semantics it can cover as many or as few types as a particular database provides and reflect the native semantics more accurately—even for databases that don't exist yet. This does push the implementation burden onto the provider of the driver, but it also affords them the ability to define their types to fit their underlying database better.

Ultimately, what matters is how users of the database/sql package are going to be using this information. Admittedly, I've only ever used it for debugging, so strings would suit me just fine.

Constants are cheaper but do not cover all the cases, so you still need a way to compare arbitrary types anyway since TypeCustom equaling TypeCustom isn't really the whole story. So there should probably be an Equal(t Type) method on Type and an AssignableTo(t Type). This would have to be provided by the driver even for common types.

Most users will only be dealing with one database at a time so it doesn't really matter if sqlite3.TypeText != pg.TypeText: you're probably going to have to handle that specially anyway if you're writing code that cares about such things.

I'm not convinced an interface is better than just strings: strings are much simpler and easier for a driver to implement. I'm pretty sure constants in database/sql aren't the way to go here, though. It makes the world seem more uniform than it is.

kardianos commented 8 years ago

@jimmyfrasche Can you please point me to documentation explaining how My Sql is wishy washy about types?

Are you disagreeing that there are only so many distinct types or that there are many others then listed https://github.com/kardianos/rdb/blob/master/types.go#L53 ?

jimmyfrasche commented 8 years ago

MySql does a lot of implicit conversions https://dev.mysql.com/doc/refman/5.7/en/type-conversion.html though knowing the type upfront could allow you to work around that, if you knew that might get you in trouble and you knew you were dealing with MySql.

Are you disagreeing that there are only so many distinct types or that there are many others then listed

Both. There are databases with fewer types. There are databases with more types. (Many with infinite types in the sense that you can define arbitrary composite types like arrays and records).

I'm also trying to point out that just because two types have the same name in two different databases it doesn't mean they behave the same or that all the same rules apply.

None of that is to say that the type information isn't useful: just that it's very dependent on what database you're using.

I'm thinking something like

//in driver package
type Type interface {
    fmt.Stringer
    ValueConverter

    Equal(Type) bool
    AssignableTo(Type) bool
   //and so on
}

is going to be more useful since it can handle a lot of those database-specific nuances for you. An sqlite driver could just always return true for AssignableTo and a postgres driver could return false if Equal returns false (after checking a handful of special cases surrounding text types, iirc).

kardianos commented 8 years ago

MySql does a lot of implicit conversions https://dev.mysql.com/doc/refman/5.7/en/type-conversion.html though knowing the type upfront could allow you to work around that, if you knew that might get you in trouble and you knew you were dealing with MySql.

This is how SQL works, not just in My SQL. In MS SQL Server, if you send up a nvarchar param to a varchar column in a where clause, the performance goes down significantly due to increased conversion overhead. Silly? Yes. But I've gotta work with that without a bunch of hacks.

I'm also trying to point out that just because two types have the same name in two different databases it doesn't mean they behave the same or that all the same rules apply.

To a degree. A type Text in MS SQL server has a distinct meaning from varchar(max), whereas in postgresql type Text is just an unbounded varchar w/o significant differences. But an interface you are suggesting can't capture that well, as I'm not just taking about assignability to another database type. I also want to specify the input parameters #12381 where I need a database type.

There are databases with fewer types. There are databases with more types.

Fewer types, some types aren't used, more types are defined in the driver itself. Despite the many dialects of SQL, data types do have a common base. And the differences a driver may define and reference. I don't see the problem.


I find value in knowing what the database thinks a type is for both result columns and for input parameters. You are saying for result columns, provide a string representation and an interface to provide many functions you might want to do with it. I think that falls down in the input parameter case.

There may be value in letting the driver expose a function like:

type TypeCompare interface{
  Assignable(t1, t2 Type) bool
  Generic(t Type) Type
  ...
}

That could have value. But I think a standard type list, along with a driver type list is a first step. Do a survey among common and uncommon RDMBS types. They all share the vast majority of types, there is some variation as in the MS text vs varchar, but in those cases knowing the exact differences as a programmer is required (If a stored proc takes text, don't give it varchar for example).

jimmyfrasche commented 8 years ago

I see what you mean re #12381 though the driver could export the types for this use. That doesn't help if you're trying to remain database agnostic, though. But presumably you're passing this as a hint to the driver, so you're already not database agnostic. Maybe I don't understand the use case.

If a specific driver is defining its own types—which it would have to for databases that allow user defined types at the very least—you still need an interface. Then the Type constants in database/sql would have to implement that interface, so they could be used interchangeably with the driver's types. But then they couldn't have anything db specific, like equivalence, so none of the types in database/sql could ever contain information specific to a database, so the interface couldn't have things like AssignableTo. If you needed that information you'd need to encode that in the driver or a third location, so what's the point of having the type constants in database/sql? They'd just be a list of names that could mean different things. I do not see the point.

It would be nice for database/sql to have more features, but I'm wary of encoding too much information into it.

What is the use case for having these specific constants in database/sql? Like I said, I've only ever used it for debugging. I imagine it would be very handy for statement mappers and ORMs but I've never written such things.

kostya-sh commented 8 years ago

As far as I can see there are 3 places where types could be useful:

I think it would help to discuss and understand requirements for each of these cases separately.

Before jumping to the API discussions I suggest we should understand the use case for the requested feature better.

@kirubasankars, can you please explain what you meant by "some level dynamic column value parsing"?

Has anyone needed this functionality before? What was the use case?

kostya-sh commented 8 years ago

For the reference this is what JDBC provides: http://docs.oracle.com/javase/8/docs/api/java/sql/ResultSetMetaData.html

I can imagine only a few database drivers support gettting all this information though. For example this is what you can get out of ProstgreSQL: https://www.postgresql.org/docs/9.5/static/protocol-message-formats.html (search for RowDescription (B))

kardianos commented 8 years ago

@jimmyfrasche Yes, this would not be database agnostic. That would need a different system layer to filter it through. It would just expose what the RDBMS SQL type is.

Use case for results where switching on go type doesn't work:

var sql = `
select
  d.ID, d.Name, d.Mime,
 fromp.FullName as  FromFullName,  forp.FullName as ForFullName
from
  seq.Document d
  join acc.Person fromPerson fromp on fromp.ID = d.FromPerson
  left join acc.Person forPerson forp on forp.ID = d.ForPerson
where
  1=1 and ...;
`

If you need to encode a information regarding ForFullName, you can't rely on it being there in the first row, in fact it may never be represented. This isn't a problem if you ad-hoc everything, but if you push more to app specific framework, it can be more problematic.

However, per Brad in comment https://github.com/golang/go/issues/12381#issuecomment-239589860 I think this is moot. So while Brad also said https://github.com/golang/go/issues/16652#issuecomment-239328934 , it would need to be without specific type information being represented in the sql pacakge.

kirubasankars commented 8 years ago

@kostya-sh, Trying to create entire row sets as map[string]interface{}.

While executing that query, my code have no idea about how many column, their names and types

Basic types are automatically mapped to GO types, that's good. Custom type are not, (example PostgreSQL custom type)

driver's can get me those custom values by only as RawBytes

I was expecting this package should expose type and size information.

in my opinion, may be java handle this better, java sample click here

Also have a look at JDBC sql types

juicemia commented 8 years ago

Something just occurred to me.

At the risk of sounding dumb: how would I go about adding a method on an interface type?

// This is illegal because Rows is an interface type
func (rs *Rows) Schema() Schema

Adding it to the Rows interface breaks the API and thus the API tool fails. I added the old interface to except.txt but I'm not sure if that's such a good idea? Some help from more experienced contributors would be awesome.

EDIT: I'm only doing this to start playing around and getting familiar with the source. I don't feel like we've reached a consensus as to what should be done.

kardianos commented 8 years ago

@juicemia *sql.Rows is a struct. As such it can be added to safely.

juicemia commented 8 years ago

@kardianos wow how embarrassing. 😅 i was looking in driver and totally forgot about the rest of the package. 😶

So what's the direction we should take on the types then? I agree with @jimmyfrasche that we should just expose a Type interface, and leave the implementation of the types to the database drivers.

juicemia commented 8 years ago

Has discussion for this moved somewhere else, or has it just tapered off?

kardianos commented 8 years ago

@juicemia It has not moved. However, it is now my explicit understanding that @bradfitz does not want to see SQL Database types exposed in this package. So even though it isn't closed, I put it into the "highly doubtful" range.

See https://github.com/golang/go/issues/12381#issuecomment-239589860

Also see: https://github.com/golang/go/issues/16673#issuecomment-239619667 for the scope of changes I hope to tackle for go1.8.

bradfitz commented 8 years ago

@kardianos, no, I'm fine with making type information available. What I was opposed to that in that comment is making users redundantly specify type information when inserting data, since an interface value already conveys type information.

I'd like to see an API proposal for this issue.

kardianos commented 8 years ago

@bradfitz Oh, then I misunderstood. To be clear before, the type information would always be optional and if not present base the DB type from the Go type.

kardianos commented 8 years ago

@bradfitz Proposal for result type information.

I've used the result types from queries in a number of places (C# ADO.Net, and go rdb). Whereas for query parameters where you want to specify in unusual cases the specific actual database type (varchar vs nvarchar vs text), the results I have typically used the generic type (text, date, date+time of day, decimal, floating point, integer, array, structured (JSON, XML)).

I can think of three ways to specify this generic type:

While I view the generic type as most important for results, I would also like to allow fetching the database specific type. I could see this as either a type const defined in the driver or in database/sql.

Are there any of these options that you would not want to see in the database/sql package? Or do you have other opinions? I can send in a more specific proposal after I get that bit of feedback. Thanks.

gnewton commented 8 years ago

Uh, I'll have a working proposal by Monday. Its working now but needs some cleanup. I'm the submitter of #7408 who just saw this activity after implementing something. There are some interesting nuances that I did not realize until doing this.

gnewton commented 8 years ago

Sorry, not trying to be mysterious or anything. Just that I don't have time to explain it properly and clean it up properly today. :-)

And sorry for the 'sorry': Canadian.

juicemia commented 8 years ago

@gnewton it's alright. I've been working on something as well based on some of the ideas in @kardianos rdb package. I just need to clean it up a bit.

gnewton commented 8 years ago

OK. I think I'm taking a simpler path, so it will be useful to compare and see what the community thinks. :-)

juicemia commented 8 years ago

@gnewton agreed. I've pushed my initial work to my fork. It can be found here.

kardianos commented 8 years ago

@juicemia The diff can be found here: https://github.com/juicemia/go/commit/53d1cf564464f6d7f543126007f28a6804847023 It may be easier in the future to view these in a standard CL in gerrit. Just ensure you add DO NOT REVIEW on in it.

juicemia commented 8 years ago

Oh okay. Yeah sorry I should have linked the commit.

bradfitz commented 8 years ago

After a chat with @kardianos, we've converged on a design like:

type ColumnInfo struct {
        // unexported fields                                                                                               
        // driver handle some                                                                                              
}

func (ci ColumnInfo) Name() string { ... }
func (ci ColumnInfo) ScanType() reflect.Type  { ... } // something that's suitable for Scan                                         
func (ci ColumnInfo) Nullable() (nullable, ok bool) { ... }
func (ci ColumnInfo) DriverTypeName() string { ... }     // "CHAR" without (length)                                                

func (rs *Rows) ColumnInfo() ([]ColumnInfo, error) {

}
kirubasankars commented 8 years ago

Are we missing column size from the driver?

neilotoole commented 8 years ago

Seconding @kirubasankars on column size.

Also DriverTypeName... does this mean driver-specific strings? It would be highly convenient if there were a standardized mapping of the most common types to constants a la JDBC etc. Otherwise I'll end up doing something like stdType := allDrivers['postgres'][driverTypeName] (which is basically what I do today). Not the end of the world, but could be confusing for newcomers.

neilotoole commented 7 years ago

@kardianos @bradfitz @kirubasankars

I hope the ship hasn't sailed on the column size decision. I've been working with this very issue literally two minutes ago on my own project, decided I better drop in on this ticket again.

I'll give you a specific use-case where knowing the column size is important (and this is only just my particular case from tonight, this is a common occurrence with ETL software). I'm working on an app that moves data between heterogenous datasources (where the schemata are not known ahead of time, etc). Sometimes this means that there's a scratch DB in the middle where data munging/staging goes on. Let's say I've got oracle1.tblUser and postgres1.tblAddress, and there's a bunch of text fields on each, let's say email VARCHAR(64) and city VARCHAR(255). Right now, how I make this work is that I have a thin layer wrapping each and every SQL driver so that I can get access to the field type/size etc so that I can present it in a uniform manner to the worker (with go-mysql I made my own fork because for some inexplicable reason the driver doesn't export the field info).

Now, under the proposal above, yes I would know that a column is a text-ish field, but without knowing the size the best one can do is create a LONGTEXT col or such, even though the drivers have access to the fact that those text cols are of max size 64 and 255. If you're dealing with throughput of billions of rows, the performance difference between a VARCHAR(64) / VARCHAR(255) and a LONGTEXT does become significant (and in some databases, egregiously so).

Again, the col size is data that is ordinarily available to the driver. If for some reason the driver can't present that information, we could use -1 as a sentinel to indicate it's not available. Hell, you can make that the default in the sql layer. But please, at least make it possible for drivers to provide this information back up through the API layer to the caller. There's no downside to it.

neilotoole commented 7 years ago

One more thing: should we not also include a method as follows?

func (ci ColumnInfo) Alias() string { ... }

This can default to ci.Name().

neilotoole commented 7 years ago

OK, maybe just one more thing... let's not forget about:

func (ci ColumnInfo) IsPrimaryKey() bool { ... }

Although, that's not ideal either? I'd be more inclined to go this path:

func (ci ColumnInfo) Attrs() Attr
// or Attributes() to be verbose
// if ci.Attrs() & AttrPrimaryKey ... or AttrNullable, AttrIndex etc.

This would be more extensible/future-proof; implementations could choose how much (if any) detail to send back in the attributes, and would allow us to drop the ci.Nullable() method, which is really just a special case of col attributes.

kardianos commented 7 years ago

@neilotoole I agree we need length, and possibly Scale and Precision. I'm not familiar with a protocol that exposes Alias as a separate property then Name. It is important to know that the design is chosen to allow adding additional method if the need arises.

gopherbot commented 7 years ago

CL https://golang.org/cl/29961 mentions this issue.

neilotoole commented 7 years ago

@kardianos I think you're probably correct about alias. But yes, length/size and scale/precision would be useful.

kardianos commented 7 years ago

@neilotoole Size and length are included in the CL. Please review if you want to.

neilotoole commented 7 years ago

@kardianos thanks, I just reviewed patch set 5.

neilotoole commented 7 years ago

@kardianos @bradfitz

I had some time this week to look over the proposal. I'm generally ok with the proposed sql pkg API, but I think the driver pkg changes are going off in the wrong direction. I'm referencing driver.go PS7 (Gerrit).

TLDR: Replace the five new driver pkg interfaces with one ColumnTyper interface.


It seems to me that having these five separate interfaces is overly elaborate. This is what is currently proposed (comments omitted):

package driver

type RowsColumnTypeScanType interface {
    Rows
    ColumnTypeScanType(index int) reflect.Type
}

type RowsColumnTypeDatabaseTypeName interface {
    Rows
    ColumnTypeDatabaseTypeName(index int) string
}

type RowsColumnTypeLength interface {
    Rows
    ColumnTypeLength(index int) (length int64, ok bool)
}

type RowsColumnTypeNullable interface {
    Rows
    ColumnTypeNullable(index int) (nullable, ok bool)
}

type RowsColumnTypePrecisionScale interface {
    Rows
    ColumnTypePrecisionScale(index int) (precision, scale int64, ok bool)
}

And this from the sql package:

package sql

func (rs *Rows) ColumnTypes() ([]ColumnType, error) {
    if rs.isClosed() {
        return nil, errors.New("sql: Rows are closed")
    }
    if rs.rowsi == nil {
        return nil, errors.New("sql: no Rows available")
    }
    return rowsColumnInfoSetup(rs.rowsi), nil
}

func rowsColumnInfoSetup(rowsi driver.Rows) []ColumnType {
    names := rowsi.Columns()

    list := make([]ColumnType, len(names))
    for i := range list {
        ci := &list[i]
        ci.name = names[i]

        if prop, ok := rowsi.(driver.RowsColumnTypeScanType); ok {
            ci.scanType = prop.ColumnTypeScanType(i)
        } else {
            ci.scanType = reflect.TypeOf(new(interface{})).Elem()
        }
        if prop, ok := rowsi.(driver.RowsColumnTypeDatabaseTypeName); ok {
            ci.databaseType = prop.ColumnTypeDatabaseTypeName(i)
        }
        if prop, ok := rowsi.(driver.RowsColumnTypeLength); ok {
            ci.length, ci.hasLength = prop.ColumnTypeLength(i)
        }
        if prop, ok := rowsi.(driver.RowsColumnTypeNullable); ok {
            ci.nullable, ci.hasNullable = prop.ColumnTypeNullable(i)
        }
        if prop, ok := rowsi.(driver.RowsColumnTypePrecisionScale); ok {
            ci.precision, ci.scale, ci.hasPrecisionScale = prop.ColumnTypePrecisionScale(i)
        }
    }
    return list
}

I suggest replacing those five driver interfaces with one interface:

package driver

type ColumnTyper interface {
  Rows
  ColumnTypes() ([]ColumnType, error)
}

The issues I see with this current proposal include:

neilotoole commented 7 years ago

If we were to go with the ColumnTyper approach: for safety, I (manually) verified that there is no exported ColumnTypes field or method on the struct implementing driver.Rows in any of the known driver packages, so this should be a non-breaking change.

List of drivers is from golang wiki. Each link is to the source for the struct that implements driver.Rows:

neilotoole commented 7 years ago

For my own private project, I took almost this exact approach when hacking the sql API. My changes looked something like this (omitting the ColumnType type in both packages):

package driver

type ColumnTyper interface {
    ColumnTypes() ([]ColumnType, error)
}

EDIT FOR CLARITY : this copy-n-edit snippet below (which exposes the driver package) is a poor example: driver.ColumnType would need to be wrapped in sql.ColumnType.

package sql

func (rs *Rows) ColumnTypes() ([]driver.ColumnType, error) {
    if rs.closed {
        return nil, errors.New("sql: Rows are closed")
    }
    if rs.rowsi == nil {
        return nil, errors.New("sql: no Rows available")
    }

    if colTyper, ok := rs.rowsi.(driver.ColumnTyper); ok {
        return colTyper.ColumnTypes()
    }

    return nil, errors.New("sql: driver does not provide column type data")
}

In my own project I have already implemented the ColumnTyper API changes for Postgres (lib/pq), MySQL (go-sql-driver) and SQLite (mattn), so I know from first-hand implementation experience that this approach works just fine. And it seems less complicated than the proposed approach.