apache / cassandra-gocql-driver

GoCQL Driver for Apache Cassandra®
https://cassandra.apache.org/
Apache License 2.0
2.58k stars 621 forks source link

NULL text, boolean and blob fields returned initialized to zero value by SliceMap() #1248

Open petr-stratoscale opened 5 years ago

petr-stratoscale commented 5 years ago

Cassandra: 3.11 Gocql: master (70385f88b28b43805bd83d212169ab2d38810b15) Go: go version go1.10.5 linux/amd64

Hello.

I have a Cassandra table with various column types some of which holding a null value:

id | blob | boolean | decimal | list<frozen<map<text, text>>> | map<text, frozen<map<text, text>>> | set | set | set | text ----+------+---------+---------+-------------------------------+------------------------------------+-----------+--------------+-----------+------ 0 | null | null | null | null | null | null | null | null | null

I query the table with "SliceMap()" like this:

package main

import (
    "fmt"
    "github.com/gocql/gocql"
)

func main() {
    cluster := gocql.NewCluster("10.42.188.4")
    cluster.Consistency = gocql.One
    cluster.PoolConfig.HostSelectionPolicy = gocql.RoundRobinHostPolicy()
    cluster.Compressor = gocql.SnappyCompressor{}
    s, err := cluster.CreateSession()
    if err != nil {
        panic(fmt.Sprintf("%s\n", err.Error()))
    } else {
        fmt.Println("Connection established.")
    }

    q := s.Query(`SELECT * FROM "testks"."types"`)
    rows, ok := q.Iter().SliceMap()
    if ok != nil {
        panic(fmt.Sprintf("%s\n", ok.Error()))
    }
    for i, row := range rows {
        fmt.Printf("Data for row #%d\n", i)
        for col, val := range row {
            fmt.Printf("%s (%T): %s    ", col, val, val)
        }
        fmt.Printf("\n")
    }
    s.Close()
}

And get the following output:

Connection established. Data for row #0 map<text, frozen<map<text, text>>> (map[string]map[string]string): map[] set ([][]uint8): [] decimal (inf.Dec): \<nil> list<frozen<map<text, text>>> ([]map[string]string): []
boolean (bool): %!s(bool=false)
set ([]
inf.Dec): []
set ([]string): []
text (string):
id (*inf.Dec): 0
blob ([]uint8):

Only the 'decimal' value is set to 'nil'. All other values seem to have 'zero' value for their type. Now, I get that empty set, list and map types are represented as null, but I would expect 'nil' values text, boolean and blob columns. The applications need to be able to distinguish between a value and absence of value in the fields. Clearly a blank string, false or 0x [textAsBlob('')] are not the same as never-set values.

Any ideas?

Thanks.

alourie commented 5 years ago

@petr-stratoscale I just redacted the report to make it more clear. Am looking into this.

alourie commented 5 years ago

@petr-stratoscale at the moment we're translating native Cassandra types into native golang types. As golang doesn't have the "null" value in a sense that Java or other languages have, they golang types are initiated with their "Zero" value, which would be empty string, 0 or empty blob.

Now I understand the value of being able to use some kind of "null" instead of empty field to separate between the two. There are 2 possible takes on that:

1) Use pointers to return types. This has the benefit of being set to nil if there's no underlining value. 2) Use custom types to represent Cassandra types. This would allow setting a call on a type such as IsNull() or a boolean field, like v.isNull with the same idea.

I'd love to head what @Zariel thinks in this regard.

kostja commented 5 years ago

Is there a way to reflect that a value is nil in an additional metadata returned for the row? There are a lot of database management systems and a lot of strongly typed languages. Yet they have managed to overcome it. This is ODBC: https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlgetdata-function https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/using-length-and-indicator-values

alourie commented 5 years ago

@kostja not sure about that. Yes, there are many strongly typed languages, but Golang is still a bit different in how it represents "zero" or "empty" values. Also, we don't parse the metadata for guessing the values, that feels wrong.

As I said, if we indeed use pointers to return the types, that would help to solve the issue, as non-value for a pointer would be nil and that is easy to differentiate vs an empty value, which would have the pointer not to be nil.

kostja commented 5 years ago

I personally don't care much how it is done, Go way or some other way, as long as it's there. Pointers are OK (even though less convenient for day-to-day usage). Besides, Go's idea of empty value, and Cassandra idea of empty value are not necessarily the same, and it's not necessarily accurate to map one to another 1:1. Go selects a default in each domain to represent a missing value, SQL and CQL introduce a new element into each domain, called NULL. It's a clear impedance mismatch, and one has to have a choice how to overcome it.

mpenick commented 4 years ago

What about adding a method to the Scanner interface?

type Scanner interface {
       // ...

    IsNull(columnIndex int) (bool, error)
}

or even

type Scanner interface {
       // ...

    IsNull(columnIndex int) bool
}

Here's a patch of what that might look like: https://github.com/mpenick/gocql/commit/7d4e726582407f116922cece4c45c8c64167d665 (No tests yet).

This would add null support for primary major use case (columns values), but it looks like nulls are also supported in tuples and user defined types (via https://gist.github.com/mpenick/cc58cc4134cfd93518347811100ee1ff). Other collection types (list, set, map) do not support null keys and/or values. Perhaps scanner types could be added as alternative interfaces for tuples and user defined types e.g. TupleScanner.