dgraph-io / dgraph

The high-performance database for modern applications
https://dgraph.io
Other
20.31k stars 1.49k forks source link

[Documentation]: Advise raising system limit on mmapped files during bulk loading #8616

Closed mocurin closed 1 month ago

mocurin commented 1 year ago

What version of Dgraph is the target?

22.0.2

Documentation.

Documentation should state that kernel parameter vm.max_map_count is needed to be raised in order to load large datasets.

This affects mainly bulk-loader section. It's far more likely to encounter this issue during bulk loading a few Tib (about 2 in my case) export. Though after resolving export issue my alpha nodes also exceeded default limit (around 65k) after maxlevel increase, so, maybe, production checklist/FAQ sections should also mention this.

Additional information.

Reduce stage of dgraph bulk loader fails with this error:

[08:48:18+0300] REDUCE 18h34m19s 99.83% edge_count:130.5G edge_speed:1.952M/sec plist_count:98.24G plist_speed:1.469M/sec. Num Encoding MBs: 512. jemalloc: 3.2 GiB
[08:48:19+0300] REDUCE 18h34m20s 99.84% edge_count:130.5G edge_speed:1.952M/sec plist_count:98.24G plist_speed:1.469M/sec. Num Encoding MBs: 513. jemalloc: 3.5 GiB
[08:48:20+0300] REDUCE 18h34m21s 99.84% edge_count:130.5G edge_speed:1.952M/sec plist_count:98.24G plist_speed:1.469M/sec. Num Encoding MBs: 260. jemalloc: 2.8 GiB
panic: while creating table: out/1/p/025670.sst error: cannot allocate memory
while mmapping out/1/p/025670.sst with size: 64697135
github.com/dgraph-io/ristretto/z.OpenMmapFileUsing
        /home/alexeev/go/pkg/mod/github.com/dgraph-io/ristretto@v0.1.1/z/file.go:59
github.com/dgraph-io/ristretto/z.OpenMmapFile
        /home/alexeev/go/pkg/mod/github.com/dgraph-io/ristretto@v0.1.1/z/file.go:86
github.com/dgraph-io/badger/v3/table.CreateTable
        /home/alexeev/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.5/table/table.go:259
github.com/dgraph-io/badger/v3.(*sortedWriter).createTable
        /home/alexeev/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.5/stream_writer.go:431
github.com/dgraph-io/badger/v3.(*sortedWriter).send.func1
        /home/alexeev/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.5/stream_writer.go:386
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1594

goroutine 2632603 [running]:
github.com/dgraph-io/badger/v3.(*sortedWriter).handleRequests.func1(0xc8d6bf2120)
        /home/alexeev/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.5/stream_writer.go:336 +0x232
github.com/dgraph-io/badger/v3.(*sortedWriter).handleRequests(0xc000374f20)
        /home/alexeev/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.5/stream_writer.go:344 +0x9c
created by github.com/dgraph-io/badger/v3.(*StreamWriter).newWriter
        /home/alexeev/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.5/stream_writer.go:308 +0x256

Steps to reproduce the issue

Run dgraph bulk loader from 130 billion edges export (around 1.5Tb of compressed data) with 2 output shards OR make target table size pretty small.

Solution

Dgraph basically gets a bad alloc error which might mean pretty much everything. But since I definitely had no shortage of physical ram or any limits on virtual memory, this comes down to a limit on mmapped files. Raising vm.max_map_count to something like 200k solved everything.

Also getting a proper error code from a mmap syscall would have helped a lot.

github-actions[bot] commented 1 month ago

This issue has been stale for 60 days and will be closed automatically in 7 days. Comment to keep it open.