apache / iceberg-go

Apache Iceberg - Go
https://iceberg.apache.org/
Apache License 2.0
142 stars 34 forks source link

fix: SIGSEGV when describe empty table #145

Closed alex-kar closed 2 months ago

alex-kar commented 2 months ago

Empty table does not have "Current Snapshot" causing describe command to be failed. Added validation to replace with empty string "".

Steps to reproduce

  1. Start env docker-compose -f dev/docker-compose.yml up -d
  2. Create namespace
    curl --location --request POST 'localhost:8181/v1/namespaces/public/tables' \
    --header 'Content-Type: application/json' \
    --data-raw '{
    "schema": {
        "type":"struct",
        "fields": [
            {
                "id":1,
                "name": "col1",
                "type": "boolean",
                "required": true
            }
        ]
    },
    "name": "t"
    }
    '
  3. Describe table t: iceberg describe public.t --uri=http://localhost:8181

Actual result

Table format version | 2
Metadata location    | s3://warehouse/public/t/metadata/00000-c2e47505-8693-4254-b335-437111d400f0.metadata.json
Table UUID           | 552a9f6b-c96e-4816-957c-160b96d57a62
Last updated         | 1725807593801
Sort Order           | 0: []
Partition Spec       | []

Current Schema, id=0
└──1: col1: required boolean

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x925504]

goroutine 1 [running]:
main.text.DescribeTable({}, 0xc0001740a0)
        /home/alex/projects/opensource/iceberg-go/cmd/iceberg/output.go:101 +0xea4
main.describe({0xb3dfe8, 0xf85660}, {0xb3e0c8, 0xc000362ed0}, {0x7ffe86519d7c, 0x8}, {0xa44687, 0x3})
        /home/alex/projects/opensource/iceberg-go/cmd/iceberg/main.go:252 +0x3db
main.main()
        /home/alex/projects/opensource/iceberg-go/cmd/iceberg/main.go:149 +0x4ef

Fixed result

Table format version | 2
Metadata location    | s3://warehouse/public/t/metadata/00000-c2e47505-8693-4254-b335-437111d400f0.metadata.json
Table UUID           | 552a9f6b-c96e-4816-957c-160b96d57a62
Last updated         | 1725807593801
Sort Order           | 0: []
Partition Spec       | []

Current Schema, id=0
└──1: col1: required boolean

Current Snapshot | 

Snapshots

Properties
key                             | value
---------------------------------------
write.parquet.compression-codec | zstd
alex-kar commented 2 months ago

@zeroshade Could you help me organize unit tests for output.go? I was looking at how tests are organized in github.com/pterm/pterm, which prints results in the terminal, and noticed that github.com/MarvinJWendt/testza is used, but I can't get it to work.

That's what I try:

// output_test.go
func TestDescribeTable(t *testing.T) {
    meta, err := table.ParseMetadataBytes([]byte(ExampleTableMetadataV1))
    if err != nil {
        t.Errorf("Failed to parse table metadata %d", err)
    }
    table := table.New([]string{"t"}, meta, "", nil)

    output, err := testza.CaptureStdout(func(w io.Writer) error {
        text{}.DescribeTable(table)
        return nil
    })
    testza.AssertEqual(t, "Expected", output)
}

And although I see result printed in terminal

Table format version | 2                                                                                                                                                                                           
Metadata location    |                                                                                                                                                                                             
Table UUID           | 9c12d441-03fe-4693-9a96-a0705ddf69c1                                                                                                                                                        
Last updated         | 1602638573590                                                                                                                                                                               
Sort Order           | 3: [                                                                                                                                                                                        
                     | 2 asc nulls-first                                                                                                                                                                           
                     | bucket[4](3) desc nulls-last                                                                                                                                                                
                     | ]                                                                                                                                                                                           
Partition Spec       | [                                                                                                                                                                                           
                     |  1000: x: identity(1)                                                                                                                                                                       
                     | ]                                                                                                                                                                                           

Current Schema, id=1
├──1: x: required long
├──2: y: required long (comment)
└──3: z: required long

Current Snapshot | append, {}: id=3055729675574597004, parent_id=3051729675574597004, schema_id=1, sequence_number=1, timestamp_ms=1555100955770, manifest_list=s3://a/b/2.avro

Snapshots
├──Snapshot 3051729675574597004, schema 1: s3://a/b/1.avro
└──Snapshot 3055729675574597004, schema 1: s3://a/b/2.avro

Properties
key                    | value
----------------------------------
read.split.target.size | 134217728

testza didn't capture stdout (line 7 - empty string)

--- FAIL: TestDescribeTable (0.00s)
    output_test.go:102: 

           1| Two objects that should be equal, are not equal.
           2| 
           3| Expected:
           4| (string) (len=8) "Expected"
           5| 
           6| Actual:
           7| (string) ""
           8| 
           9| Difference:
          10| (1. -) Expected
          11| (1. +) 
zeroshade commented 2 months ago

@alex-kar we could probably leverage "testable examples" https://go.dev/blog/examples with go and just use // Output: stuff to point out our expected output.

alex-kar commented 2 months ago

@zeroshade Same result

func ExampleDescribeTable() {
    meta, _ := table.ParseMetadataBytes([]byte(ExampleTableMetadataV1))
    table := table.New([]string{"t"}, meta, "", nil)

    text{}.DescribeTable(table)

    // Output:
    // Expected
}

Failed test

--- FAIL: ExampleDescribeTable (0.00s)
got:

want:
Expected

It does not capture output.

Separately tested pterm

func ExamplePterm() {
    pterm.Printfln("Test")

    // Output: test
}

Also fails

--- FAIL: ExamplePterm (0.00s)
got:

want:
test

I'll try to check pterm lib to understand its magic.

zeroshade commented 2 months ago

Okay, in that case i think it's fine to merge this as is without the tests for now until we can figure out how to write tests using pterm.

Thanks @alex-kar and feel free to create a PR for adding tests to this if you figure it out!