crawshaw / sqlite

Go SQLite3 driver
ISC License
561 stars 67 forks source link

API to retrieve column blob into slice of arbitrary length #116

Open israel-lugo opened 3 years ago

israel-lugo commented 3 years ago

Hi,

First of all, thank you for creating this library. I've been using database/sql for my personal projects for a while now, and having to jump through hoops to get any kind of concurrency working, and living without goodies such as save points. This library seems to provide an easy and lightweight alternative that maps much better to SQLite.

I have a column that holds binary protos. These are control messages with scalar fields, so I'm not worried about unbounded growth and don't need streaming blobs. The natural choice to access them would be (*Stmt).ColumnBytes.

The problem is that the only way to use (*Stmt).ColumnBytes correctly when you don't know the size of the blob in advance seems to be by doing:

n := stmt.ColumnLen(col)
buf := make([]byte, n)
stmt.ColumnBytes(col, buf)

This is wasteful, since (*Stmt).ColumnBytes is already calling (*Stmt).ColumnLen internally. It's also rather verbose, and makes reusing a previously allocated slice somewhat cumbersome (we'd have to possibly extend the slice so its length matches n).

Would you consider changing this to be similar to e.g. crypto/hash.(Hash).Sum? I.e. instead of copying the result of (*Stmt).columnBytes into the user's slice, append it.

// ColumnBytes appends the query result into b and returns the resulting slice.
func (*Stmt) ColumnBytes(col int, b []byte) []byte

// Example 1, user gets a newly allocated slice, fine for single use:
buf := stmt.ColumnBytes(col, nil)

// Example 2, reuse a previously allocated slice in a tight loop:
var buf []byte
for {
  if hasRow, err := stmt.Step(); err != nil {
    // handle error
  } else if !hasRow {
    break
  }
  buf = stmt.ColumnBytes(col, buf[:0])
  process(buf)
}

You may not want to modify an existing API to avoid breaking users. This could be a new method, e.g. (*Stmt).ColumnBytesAppend or some better sounding name.

AdamSLevy commented 3 years ago

As I understand it, the API was designed this way to avoid passing the user a slice of C memory which will become invalidated after the Stmt is reset. What you're proposing both offers a convenience and avoids calling ColumnLen twice. However the breaking API change is slightly problematic, however we are still below v1. It just needs to be announced clearly.

Really ColumnBytes behaves like io.Reader.Read. I wonder if it should be called ColumnBytesRead instead, and ColumnBytes should append as you describe. The current implementation of ColumnLen could also be improved to cache the results of the C call in a columnLen map[int]int.

I know that you said you don't need streaming blobs but if you want to avoid redundant ColumnLen calls with the current code then you could just use ColumnReader with ioutil.ReadAll (now io.ReadAll in 1.16). Memory/CPU wise this is probably technically slightly less performant than the append implementation you suggested, but completely inconsequential compared to a C call.

AdamSLevy commented 3 years ago

https://github.com/crawshaw/sqlite/blob/6c1d4ad4ed9d0acc0ca4b99dd67c5c14d488633c/sqlite.go#L938-L975