filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.85k stars 1.27k forks source link

`boostd dagstore initialize-all` crashes Boost - error coming from reader in lotus storage #9324

Open rjan90 opened 2 years ago

rjan90 commented 2 years ago

Checklist

Lotus component

Lotus Version

boostd version 1.4.0+git.810afec
Based on the boostd version, the SP is running either Lotus v1.17.0 or v1.17.1

Describe the Bug

This is was an error reported initially by @stuberman in the Boost-repo, but the crash is coming from the reader in the lotus storage. Issue-report here.

When running boostd dagstore initialize-all --concurrency=3 or even boostd dagstore initialize-all --concurrency=1 then boostd crashes after a few minutes. See log below:

Logging Information

2022-09-15T18:28:09.163Z    INFO    boost-storage-deal  logs/log.go:40  current sealing state   {"id": "dd917e35-d1ac-4f30-9b96-e55d00bf4799", "state": "Packing"}
panic: runtime error: slice bounds out of range [:16777216] with capacity 8388608

goroutine 599691 [running]:
github.com/filecoin-project/lotus/storage/sealer/fr32.(*unpadReader).Read(0x0, {0xc02eba5bfc, 0x140b154, 0x140b154})
    /home/stuart/go/pkg/mod/github.com/filecoin-project/lotus@v1.17.0/storage/sealer/fr32/readers.go:62 +0x378
bufio.(*Reader).Read(0xc026924060, {0xc02eba5bfc, 0x140b154, 0x1})
    /usr/local/go/src/bufio/bufio.go:213 +0x106
bufio.(*Reader).Read(0xc0269240c0, {0xc02eba5bfc, 0x140b154, 0xa0a640})
    /usr/local/go/src/bufio/bufio.go:213 +0x106
io.ReadAtLeast({0x42ef300, 0xc0269240c0}, {0xc02eb86000, 0x142ad50, 0x142ad50}, 0x142ad50)
    /usr/local/go/src/io/io.go:328 +0x9a
io.ReadFull(...)
    /usr/local/go/src/io/io.go:347
github.com/filecoin-project/lotus/storage/sealer.(*pieceReader).readAtUnlocked(0xc021ceafc0, {0xc02eb86000, 0x64c349, 0x142ad50}, 0x4)
    /home/stuart/go/pkg/mod/github.com/filecoin-project/lotus@v1.17.0/storage/sealer/piece_reader.go:187 +0xb17
github.com/filecoin-project/lotus/storage/sealer.(*pieceReader).Read(0xc021ceafc0, {0xc02eb86000, 0x142ad50, 0x142ad50})
    /home/stuart/go/pkg/mod/github.com/filecoin-project/lotus@v1.17.0/storage/sealer/piece_reader.go:100 +0x155
io.ReadAtLeast({0x7f4dae370820, 0xc021ceafc0}, {0xc02eb86000, 0x142ad50, 0x142ad50}, 0x142ad50)
    /usr/local/go/src/io/io.go:328 +0x9a
io.ReadFull(...)
    /usr/local/go/src/io/io.go:347
github.com/ipld/go-car/v2/internal/carv1/util.LdRead({0x7f4dae370820, 0xc021ceafc0}, 0x0, 0x2000000)
    /home/stuart/go/pkg/mod/github.com/ipld/go-car/v2@v2.4.2-0.20220707083113-89de8134e58e/internal/carv1/util/util.go:85 +0x19e
github.com/ipld/go-car/v2/internal/carv1.ReadHeader({0x7f4dae370820, 0xc021ceafc0}, 0x401)
    /home/stuart/go/pkg/mod/github.com/ipld/go-car/v2@v2.4.2-0.20220707083113-89de8134e58e/internal/carv1/car.go:63 +0x32
github.com/ipld/go-car/v2.ReadVersion({0x7f4dae370820, 0xc021ceafc0}, {0xc00f2d7c98, 0x9749a5, 0xc011b88300})
    /home/stuart/go/pkg/mod/github.com/ipld/go-car/v2@v2.4.2-0.20220707083113-89de8134e58e/reader.go:364 +0x90
github.com/ipld/go-car/v2.ReadOrGenerateIndex({0x7f4dae3707f8, 0xc021ceafc0}, {0xc00f2d7c98, 0x2, 0x2})
    /home/stuart/go/pkg/mod/github.com/ipld/go-car/v2@v2.4.2-0.20220707083113-89de8134e58e/index_gen.go:191 +0x7a
github.com/filecoin-project/dagstore.(*DAGStore).initializeShard.func1({0xc00f2d7d80, 0xc00f2d7d50})
    /home/stuart/go/pkg/mod/github.com/filecoin-project/dagstore@v0.5.3/dagstore_async.go:123 +0xcb
github.com/filecoin-project/dagstore/throttle.(*throttler).Do(0xc000b6c280, {0x433abd8, 0xc0000520b8}, 0xc025a847b0)
    /home/stuart/go/pkg/mod/github.com/filecoin-project/dagstore@v0.5.3/throttle/throttler.go:38 +0x118
github.com/filecoin-project/dagstore.(*DAGStore).initializeShard(0xc000a99b80, {0x433abd8, 0xc0000520b8}, 0xc016d8a3f0, {0x43489f0, 0xc0129bdef0})
    /home/stuart/go/pkg/mod/github.com/filecoin-project/dagstore@v0.5.3/dagstore_async.go:121 +0x422
created by github.com/filecoin-project/dagstore.(*DAGStore).control
    /home/stuart/go/pkg/mod/github.com/filecoin-project/dagstore@v0.5.3/dagstore_control.go:101 +0x705
^C
[1]+  Exit 2                  nohup boostd run --pprof > /market/logs/boost.log 2>&1

Repo Steps

  1. Run boostd dagstore initialize-all --concurrency=3 on a Boost-node
  2. See error coming from the reader in the lotus storage ...
rjan90 commented 2 years ago

@LexLuthr found out that the crash is coming from the reader in the lotus storage:

func (r *unpadReader) Read(out []byte) (int, error) {
    if r.left == 0 {
        return 0, io.EOF
    }

    chunks := len(out) / 127

    outTwoPow := 1 << (63 - bits.LeadingZeros64(uint64(chunks*128)))

    if err := abi.PaddedPieceSize(outTwoPow).Validate(); err != nil {
        return 0, xerrors.Errorf("output must be of valid padded piece size: %w", err)
    }

    todo := abi.PaddedPieceSize(outTwoPow)
    if r.left < uint64(todo) {
        todo = abi.PaddedPieceSize(1 << (63 - bits.LeadingZeros64(r.left)))
    }

    r.left -= uint64(todo)

    n, err := io.ReadAtLeast(r.src, r.work[:todo], int(todo))     <--- ?Slice out of bound?
    if err != nil && err != io.EOF {
        return n, err
    }
    if n < int(todo) {
        return 0, xerrors.Errorf("didn't read enough: %d / %d, left %d, out %d", n, todo, r.left, len(out))
    }

    Unpad(r.work[:todo], out[:todo.Unpadded()])

    return int(todo.Unpadded()), err
}
jennijuju commented 2 years ago

@nonsense - is there any reason why boost is depend on lotus on this?

dirkmc commented 2 years ago

The code for reading data from the workers lives in lotus so boost needs to depend on lotus to use that code.