csimplestring / delta-go

Native Delta Lake Implementation in Go
37 stars 7 forks source link

Using LocalStore unexpected folder popping up in _delta_log #53

Open mario-imperato opened 2 weeks ago

mario-imperato commented 2 weeks ago

I've taken the examples/local_example_test.go and modified in the attempt to create a delta table and adding some stuff in it. I attach the full code of the sample that is somewhat derived from an HelloWord example in the java standalone lib. Not sure if I'm missing something. At first sight I get the result but in the _delta_log it shows up an unexpected folder of the form: tests/golden/snapshot-data4/_delta_log/file:/__0x2f____0x2f__Users/marioa.imperato/projects/gits/csimplestring/delta-go/tests/golden/snapshot-data4/_delta_log__0x2f__ (see attached image).

Screenshot 2024-09-05 at 11 30 54

What I'm missing? Thank you Mario

Sample code:

package examples

import (
    "fmt"
    "github.com/csimplestring/delta-go/action"
    "github.com/csimplestring/delta-go/iter"
    "github.com/csimplestring/delta-go/op"
    "github.com/csimplestring/delta-go/types"
    "log"
    "path/filepath"
    "testing"
    "time"

    delta "github.com/csimplestring/delta-go"
)

const (
    engineInfo = "local"
)

func TestLocalCreateExample(t *testing.T) {
    path, err := filepath.Abs("../tests/golden/snapshot-data4")
    if err != nil {
        log.Fatal(err)
    }

    path = "file://" + path + "/"

    config := delta.Config{
        StoreType: "file",
    }

    table, err := delta.ForTable(path, config, &delta.SystemClock{})
    if err != nil {
        log.Fatal(err)
    }

    schema := &types.StructType{}
    schema = schema.Add(types.NewStructField("foo", &types.IntegerType{}, false))
    schema = schema.Add(types.NewStructField("bar", &types.IntegerType{}, false))
    schema = schema.Add(types.NewStructField("zip", &types.StringType{}, false))

    metadata := action.DefaultMetadata()
    metadata.SchemaString, err = types.ToJSON(schema)
    if err != nil {
        log.Fatal(err)
    }

    op := op.Operation{
        Name: op.WRITE,
    }

    for i := 0; i < 15; i++ {
        txn, err := table.StartTransaction()
        if err != nil {
            log.Fatal(err)
        }

        if i == 0 {
            err = txn.UpdateMetadata(metadata)
            if err != nil {
                log.Fatal(err)
            }
        }

        now := time.Now()

        addFile := action.AddFile{
            Path:             fmt.Sprintf("%d", i),
            DataChange:       true,
            PartitionValues:  nil,
            Size:             100,
            ModificationTime: now.UnixMilli(),
            Stats:            "",
            Tags:             map[string]string{"someTagKey": "someTagVal"},
        }

        actions := []action.Action{&addFile}
        res, err := txn.Commit(iter.FromSlice[action.Action](actions), &op, engineInfo)
        if err != nil {
            log.Fatal(err)
        }
        t.Log(res)
    }

    s, err := table.Snapshot()
    if err != nil {
        log.Fatal(err)
    }

    version := s.Version()
    log.Println(version)

    files, err := s.AllFiles()
    if err != nil {
        log.Fatal(err)
    }
    for _, f := range files {
        log.Println(f.Path)
    }

    m, err := s.Metadata()
    if err != nil {
        log.Fatal(err)
    }

    schema2, err := m.Schema()
    if err != nil {
        log.Fatal(err)
    }

    for _, f := range schema2.GetFields() {
        log.Println(f)
    }
}
csimplestring commented 2 weeks ago

hi @mario-imperato thanks for reporting this issue, i confirmed that it is a bug in local log store and i fixed it in branch fix/local-log-store-path-exist, can you checkout that branch and run it again?

currently I can not merge it into master branch because of a dockerfile dependency issue.