gkampitakis / go-snaps

Jest-like snapshot testing in Golang 📸
https://pkg.go.dev/github.com/gkampitakis/go-snaps
MIT License
146 stars 6 forks source link

[Bug]: snapshots of big slices could not be loaded back #75

Closed al32-rs closed 10 months ago

al32-rs commented 10 months ago

Description

The bufferio scanner instanced in getPrevSnapshot() uses default arguments, which means that the maximum token size for scanning a single line caps at 64*1024 bytes.

I've a test snapshot storing a slice of 5k string elements, which gets dumped into a single line of ~130k characters. As a temporary workaround I'm splitting out the slice in smaller chunks and snapshotting each one of them, but I think the snapshot should consider the MaxScanTokenSize limit and break the dumped data into multiple lines.

Steps to Reproduce

Expected Behavior

The snapshot creation should take into account the scan buffer size limits, and split data among multiple lines.

gkampitakis commented 10 months ago

Hey 👋 , thanks for openning this very detailed issue, I didn't know there is such a limit on bufio. I was indeed able to reproduce this with

// CI=true go test -v -count=1 ./... -run TestLongLine
func TestLongLine(t *testing.T) {
    t.Run("should capture long lines", func(t *testing.T) {
        snaps.MatchSnapshot(t, strings.Repeat("hello world", 10000))
    })
}

The snapshot creation should take into account the scan buffer size limits, and split data among multiple lines.

This is a good idea but I tried to fix the error by simply increasing the MaxScanTokenSize

s.Buffer([]byte{}, math.MaxInt)

do you thing there is any concern doing this 🤔 ?

al32-rs commented 10 months ago

Hi!

thanks for openning this very detailed issue, I didn't know there is such a limit on bufio.

Thank you for taking the time of reviewing the issue as well. The limit on bufio's Scan() is something I've discovered myself as well investigating a test failure which randomly started to happen with a grown-up snapshot.. 😊

This is a good idea but I tried to fix the error by simply increasing the MaxScanTokenSize do you thing there is any concern doing this 🤔 ?

Well, MaxScanTokenSize seems big enough for any practical snapshot, and will surely help. I can think about two possible concerns:

Your proposal looks way simpler and desirable, and would effectively work as a quick fix. I just fear it might hit back in the long term. The solution I proposed looks overall safer but I understand it might require some non-trivial refactoring.

One last note: I'm not a JestJS expert (I've found it thanks to your project's reference), I'm curious how it handles this case. I couldn't find much about.

gkampitakis commented 10 months ago

The limit on bufio's Scan() is something I've discovered myself as well investigating a test failure

Should have def checked the error of scanner and return it would have made the failure easier to track. Will make sure to add this instead of just returning errSnapNotFound.

One last note: I'm not a JestJS expert (I've found it thanks to your project's reference), I'm curious how it handles this case. I couldn't find much about.

I couldn't find a definitive answer as well but under the hood jest is using new Function to parse the whole snapshot file so I would guess the limit is again the system. Also a good related issue is this https://github.com/jestjs/jest/issues/8551

Your proposal looks way simpler and desirable, and would effectively work as a quick fix. I just fear it might hit back in the long term. The solution I proposed looks overall safer but I understand it might require some non-trivial refactoring.

Setting the limit to as high as possible and not trying to split the input is because the library will struggle in other places even if it can parse the line (e.g. consolidtating the snapshot and create a diff) also maybe after a point it's probably not very practical doing snapshot testing as the snapshot is probably not readable or easy to review.

gkampitakis commented 10 months ago

Btw not sure if this is apparrent but this method is not allocating a buffer of size math.MaxInt

It sets the initial buffer to use when scanning and the maximum size of buffer that may be allocated during scanning.

Initial size here being zero and maximum math.MaxInt so it will behave the same just allow to grow bigger.

s.Buffer([]byte{}, math.MaxInt)

Implementation https://github.com/golang/go/blob/master/src/bufio/scan.go#L258-L273

al32-rs commented 10 months ago

Initial size here being zero and maximum math.MaxInt so it will behave the same just allow to grow bigger.

Yes, that was what I meant with:

Scan() is iteratively doubling its buffer's size until it fits the content there's to write, I guess then it aims to a logarithmic complexity on the input length.

The fix looks ok. Thanks for the prompt action :)