goccy / bigquery-emulator

BigQuery emulator server implemented in Go
MIT License
844 stars 108 forks source link

Reading Table Metadata: NumRows is not populated #339

Open coxley opened 4 months ago

coxley commented 4 months ago

What happened?

When reading a table's metadata via the Go SDK, it appears to always be zero even if rows have recently been inserted.

I see that this was brought up in #41 and reportedly fixed in #46 (which equates to version 0.1.12), but it doesn't work even when pinning the version to back then. #249 mentions a similar issue, but didn't include enough details to reproduce so it stalled.

The SDK entry-point is here: https://pkg.go.dev/cloud.google.com/go/bigquery#Table.Metadata

Explicitly setting a TableMetadataView, such as BasicMetadataView or FullMetadataView doesn't change the outcome.

What did you expect to happen?

I'd expect the NumRows value in metadata to match the behavior of normal BigQuery.

How can we reproduce it (as minimally and precisely as possible)?

Below is a fully reproducible example using a standard Go test + testcontainers to init the emulator. Expects that docker is available.

To setup the test, do the following:

# Create temporary directory and go module for dependency fetching
cd $(mktemp -d)
go mod init test

# Write below test file

go mod tidy
go test .
// main_test.go (or whatever you'd like)
package main

import (
    "context"
    "fmt"
    "sync"
    "testing"

    "cloud.google.com/go/bigquery"
    "github.com/stretchr/testify/require"
    "github.com/testcontainers/testcontainers-go"
    "github.com/testcontainers/testcontainers-go/wait"
    "google.golang.org/api/iterator"
    "google.golang.org/api/option"
)

const (
    bqPort      = "9050/tcp"
    testProject = "project"
)

// startBigQuery spins up a test container and blocks until it's ready. Only one
// container can be started per test binary.
var startBigQuery = sync.OnceValues(func() (testcontainers.Container, error) {
    ctx := context.Background()
    return testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
        ContainerRequest: testcontainers.ContainerRequest{
            Image: "ghcr.io/goccy/bigquery-emulator:latest",
            Cmd: []string{
                "--project=" + testProject,
            },
            ExposedPorts: []string{bqPort},
            WaitingFor:   wait.ForExposedPort(),
        },
        Started: true,
    })
})

func TestBreak(t *testing.T) {
    ctx := context.Background()

    // Startup emulator and create client
    container, err := startBigQuery()
    require.NoError(t, err)

    addr, err := container.Endpoint(ctx, "")
    require.NoError(t, err)

    endpoint := "http://" + addr
    t.Logf("bigquery endpoint: %s", endpoint)
    client, err := bigquery.NewClient(
        ctx,
        testProject,
        option.WithoutAuthentication(),
        option.WithEndpoint(endpoint),
    )
    require.NoError(t, err)
    t.Log("bigquery client: connected")

    // Bootstrap BQ datasets
    ds := client.Dataset("main")
    err = ds.Create(ctx, nil)
    require.NoError(t, err)
    t.Logf(
        "main dataset created: %s",
        ignoreErr(ds.Identifier(bigquery.StandardSQLID)),
    )

    t.Cleanup(func() {
        require.NoError(t, ds.Delete(ctx))
    })

    table := ds.Table("data")
    err = table.Create(ctx, &bigquery.TableMetadata{
        Schema: bigquery.Schema{
            &bigquery.FieldSchema{Name: "value", Type: bigquery.NumericFieldType},
        },
    })

    require.NoError(t, err)
    t.Logf(
        "table created: %s",
        ignoreErr(table.Identifier(bigquery.StandardSQLID)),
    )

    // Insert 10 rows with value:n set
    rows := []saver{}
    for n := range 10 {
        rows = append(rows, saver{n})
    }
    inserter := table.Inserter()
    require.NoError(t, inserter.Put(ctx, rows))
    t.Log("rows: inserted")

    // Query all rows with SQL and confirm that the expected number exists
    q := client.Query(fmt.Sprintf("SELECT * FROM %s", ignoreErr(table.Identifier(bigquery.StandardSQLID))))

    it, err := q.Read(ctx)
    require.NoError(t, err)

    var cnt int
    for {
        row := []bigquery.Value{}
        err := it.Next(&row)
        if err == iterator.Done {
            break
        }
        require.NoError(t, err)
        cnt++
    }
    require.EqualValues(t, 10, cnt)

    // Query table metadata and assert that the number of rows matches
    md, err := table.Metadata(ctx)
    require.NoError(t, err)
    require.EqualValues(t, md.NumRows, 10)
}

type saver struct {
    value int
}

func (s *saver) Save() (map[string]bigquery.Value, string, error) {
    return map[string]bigquery.Value{"value": s.value}, "", nil
}

func ignoreErr[T any](v T, err error) T {
    if err != nil {
        panic(err)
    }
    return v
}

Anything else we need to know?

No response