Documentation should state that kernel parameter vm.max_map_count is needed to be raised in order to load large datasets.
This affects mainly bulk-loader section. It's far more likely to encounter this issue during bulk loading a few Tib (about 2 in my case) export. Though after resolving export issue my alpha nodes also exceeded default limit (around 65k) after maxlevel increase, so, maybe, production checklist/FAQ sections should also mention this.
Additional information.
Reduce stage of dgraph bulk loader fails with this error:
Run dgraph bulk loader from 130 billion edges export (around 1.5Tb of compressed data) with 2 output shards OR make target table size pretty small.
Solution
Dgraph basically gets a bad alloc error which might mean pretty much everything. But since I definitely had no shortage of physical ram or any limits on virtual memory, this comes down to a limit on mmapped files.
Raising vm.max_map_count to something like 200k solved everything.
Also getting a proper error code from a mmap syscall would have helped a lot.
What version of Dgraph is the target?
22.0.2
Documentation.
Documentation should state that kernel parameter
vm.max_map_count
is needed to be raised in order to load large datasets.This affects mainly bulk-loader section. It's far more likely to encounter this issue during bulk loading a few Tib (about 2 in my case) export. Though after resolving export issue my alpha nodes also exceeded default limit (around 65k) after maxlevel increase, so, maybe, production checklist/FAQ sections should also mention this.
Additional information.
Reduce stage of dgraph bulk loader fails with this error:
Steps to reproduce the issue
Run dgraph bulk loader from 130 billion edges export (around 1.5Tb of compressed data) with 2 output shards OR make target table size pretty small.
Solution
Dgraph basically gets a
bad alloc
error which might mean pretty much everything. But since I definitely had no shortage of physical ram or any limits on virtual memory, this comes down to a limit on mmapped files. Raisingvm.max_map_count
to something like 200k solved everything.Also getting a proper error code from a mmap syscall would have helped a lot.