golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.35k stars 17.58k forks source link

os: ioutil.ReadFile() returns a truncated buffer on Windows 10 #26923

Closed AWoloszyn closed 6 years ago

AWoloszyn commented 6 years ago

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

1.10.3

Does this issue reproduce with the latest release?

This is the latest binary release I can find on windows.

What operating system and processor architecture are you using (go env)?

Windows 10, amd64

What did you do?

I called ioutil.ReadFile() on a file that is > 4GB (4470475739 bytes).

bytes, err := ioutil.ReadFile("C:\\Path\\To\\Big\\File.txt")
if err != nil {
   panic(err)
}
fmt.Printf("%v", len(bytes))

What did you expect to see?

I expected to see an output of 4470475739

What did you see instead?

I got an output of 175508955

ianlancetaylor commented 6 years ago

Some Windows person will have to look at this, but I'll note that the fix may be to copy the use of maxRW from internal/poll/fd_unix.go to internal/poll/fd_windows.go.

odeke-em commented 6 years ago

/cc @alexbrainman

alexbrainman commented 6 years ago

@AWoloszyn,

Sure. I can reproduce this. Thank you.

I'll note that the fix may be to copy the use of maxRW from internal/poll/fd_unix.go to internal/poll/fd_windows.go.

What maxRW value do you suggest, @ianlancetaylor ? This particular issue calls Windows ReadFile API https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-readfile , and the API uses DWORD (uint32), to pass file size. Also do you know if we have a test for this? Should I add new test?

Alex

alexbrainman commented 6 years ago

What maxRW value do you suggest, @ianlancetaylor ? This particular issue calls Windows ReadFile API https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-readfile , and the API uses DWORD (uint32), to pass file size. Also do you know if we have a test for this? Should I add new test?

Never mind all my questions. I made decisions myself. And I am not adding test, because playing with 4GB file takes too long on my computer.

Alex

gopherbot commented 6 years ago

Change https://golang.org/cl/129137 mentions this issue: internal/poll: cap reads and writes to 1GB on windows

silbinarywolf commented 6 years ago

I've spent a little bit of time investigating and testing code to fix this. I'm not sure on what approach should be taken for testing this as writing out such a large file and then reading it back in (as per my rough test code below) would be super slow!

I'm more than willing to put together some PR(s) together if this looks OK!

Ideas:

ReadFile / WriteFile 4GB error catching

// NOTE: ReadFile looks the same at the start of the function
func WriteFile(handle Handle, buf []byte, done *uint32, overlapped *Overlapped) (err error) {
    var _p0 *byte
    if len(buf) > 0 {
        _p0 = &buf[0]
                // NOTE: Don't allow writing buffers larger than 4GB as ReadFile takes uint32
        if len(buf) > 4<<30 {
            err = EINVAL
            return
        }
    }
    r1, _, e1 := Syscall6(procWriteFile.Addr(), 5, uintptr(handle), uintptr(unsafe.Pointer(_p0)), uintptr(len(buf)), uintptr(unsafe.Pointer(done)), uintptr(unsafe.Pointer(overlapped)), 0)
    if r1 == 0 {
        if e1 != 0 {
            err = errnoErr(e1)
        } else {
            err = EINVAL
        }
    }
    return
}

Apply maxRW logic from fd_unix

internal\poll\fd_windows.go

// Windows can't read or write 4GB+ files at a time with syscall.Read or syscall.Write
// even on 64-bit systems.
// Use 1GB instead of, say, 2GB-1, to keep subsequent reads aligned.
const maxRW = 1 << 30

// Read implements io.Reader.
func (fd *FD) Read(buf []byte) (int, error) {
    if err := fd.readLock(); err != nil {
        return 0, err
    }
    defer fd.readUnlock()

    var n int
    var err error
    if fd.isFile || fd.isDir || fd.isConsole {
        fd.l.Lock()
        defer fd.l.Unlock()
        if fd.isConsole {
            n, err = fd.readConsole(buf)
        } else {
            if len(buf) > maxRW {
                buf = buf[:maxRW]
            }
            n, err = syscall.Read(fd.Sysfd, buf)
        }
        if err != nil {
            n = 0
        }
    } else {
        o := &fd.rop
        o.InitBuf(buf)
        n, err = rsrv.ExecIO(o, func(o *operation) error {
            return syscall.WSARecv(o.fd.Sysfd, &o.buf, 1, &o.qty, &o.flags, &o.o, nil)
        })
        if race.Enabled {
            race.Acquire(unsafe.Pointer(&ioSync))
        }
    }
    if len(buf) != 0 {
        err = fd.eofError(n, err)
    }
    return n, err
}

// Write implements io.Writer.
func (fd *FD) Write(buf []byte) (int, error) {
    if err := fd.writeLock(); err != nil {
        return 0, err
    }
    defer fd.writeUnlock()

    var n int
    var err error
    if fd.isFile || fd.isDir || fd.isConsole {
        fd.l.Lock()
        defer fd.l.Unlock()
        if fd.isConsole {
            n, err = fd.writeConsole(buf)
            if err != nil {
                n = 0
            }
        } else {
            var nn int
            for {
                max := len(buf)
                if max-nn > maxRW {
                    max = nn + maxRW
                }
                n, err = syscall.Write(fd.Sysfd, buf[nn:max])
                if n > 0 {
                    nn += n
                }
                if nn == len(buf) {
                    return nn, err
                }
                if err != nil {
                    return nn, err
                }
                if n == 0 {
                    return nn, io.ErrUnexpectedEOF
                }
            }
        }
    } else {
        if race.Enabled {
            race.ReleaseMerge(unsafe.Pointer(&ioSync))
        }
        o := &fd.wop
        o.InitBuf(buf)
        n, err = wsrv.ExecIO(o, func(o *operation) error {
            return syscall.WSASend(o.fd.Sysfd, &o.buf, 1, &o.qty, 0, &o.o, nil)
        })
    }
    return n, err
}

To test the maxRW fix above, I used the following code on my Windows 10 machine:

package main

import (
    "fmt"
    "io/ioutil"
)

const (
    slightlyLargerThan4GB = 4470475739
)

func main() {
    // Write large file, add text "end-of-the-file" to the end of it
    text := "end-of-the-file"
    data := make([]byte, slightlyLargerThan4GB-len(text), slightlyLargerThan4GB)
    data = append(data, text...)
    fmt.Printf("Writing %d bytes to \"test_write_4gb\"\n", len(data))
    err := ioutil.WriteFile("test_write_4gb", data, 0644)
    if err != nil {
        fmt.Printf("%s\n", err)
    } else {
        fmt.Printf("Written %d bytes to file.\n", len(data))
    }

    // Read large file, get the last bit of text
    fmt.Printf("Reading file \"test_write_4gb\"...\n")
    bytes, err := ioutil.ReadFile("test_write_4gb")
    if err != nil {
        panic(err)
    }
    endText := string(bytes[len(bytes)-len(text) : len(bytes)])
    if endText == "end-of-the-file" {
        fmt.Printf("SUCCESS! Loaded: %d bytes, End of file has text: %s\n", len(bytes), endText)
    } else {
        fmt.Printf("FAILURE! Loaded: %d bytes, End of file has text: %s\n", len(bytes), endText)
    }
}
alexbrainman commented 6 years ago

I'm more than willing to put together some PR(s) together if this looks OK!

@silbinarywolf thank you for the offer, but I already sent https://golang.org/cl/129137

Alex

silbinarywolf commented 6 years ago

Ah nice one! Looks more complete as well :)