golang / snappy

The Snappy compression format in the Go programming language.
BSD 3-Clause "New" or "Revised" License
1.52k stars 163 forks source link

Unable to decompress Snappy JSON file using golang/snappy #75

Open raihan26 opened 1 year ago

raihan26 commented 1 year ago

I've encountered an issue with the golang/snappy library where I'm unable to decompress a Snappy compressed JSON file. The error I receive is Failed to decompress content: snappy: corrupt input. However, I've verified that the file is not corrupt by successfully decompressing it using the snzip tool.

Steps to Reproduce:

  1. Compress a JSON file using Spark job by using this parameter .option("compression", "snappy") and write it to s3.
  2. Attempt to decompress the file from s3 using the following Go code:
package main

import (
    "bytes"
    "fmt"
    "io/ioutil"
    "log"
    "github.com/golang/snappy"
)

func main() {
    // Read the compressed file
    content, err := ioutil.ReadFile("path_to_your_snappy_file.snappy")
    if err != nil {
        log.Fatalf("Failed to read file: %v", err)
    }

    // Decompress using golang/snappy
    decompressed, err := snappy.Decode(nil, content)
    if err != nil {
        log.Fatalf("Failed to decompress content: %v", err)
    }

    // Print the decompressed content
    fmt.Println(string(decompressed))
}

Observe the error: Failed to decompress content: snappy: corrupt input.

Expected Behavior:

The Snappy compressed JSON file should be decompressed without errors.

Actual Behavior:

Received an error indicating the input is corrupt, even though other tools like snzip can decompress the file without issues.

Additional Information:

The Snappy compressed file is a JSON file where each line is a separate JSON object. I've verified the integrity of the file by decompressing it using snzip. The issue might be related to the specific Snappy format or framing used, but I'm not certain.

klauspost commented 9 months ago

You are using the block decompressor to decode what is probably a stream. There are unique formats (streams contains wrapped blocks). Try with a Reader.