Open eawer opened 5 years ago
Yes, this can be considered a performance-related bug in the import implementation.
Can you please try 79d7d62393142cfd7035e942cc705f6947e7bcbb? It should improve import performance considerably.
But we may need other changes as well to make it faster for those large dumps specifically. For example, a two-stage import process may be faster because we can pre-compute nodes table during the first pass and then resolve quads during the second pass, so the nodes table is mostly in read-only mode during the second pass.
I tried to load data using the latest master
Cayley version: v0.7.x-dev
Git commit hash: 300dcf1f8503
(I believe the commit you've referenced is already in there), and probably things became a bit better, but the pattern seems to be the same: Also, the docker was killed with 137 code (OOM) in 1.5h (98M quads was processed)
sudo journalctl -k | grep -i -e memory -e oom
Aug 20 14:55:14 kernel: cayley invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
Aug 20 14:55:14 kernel: oom_kill_process+0x22e/0x450
Aug 20 14:55:14 kernel: out_of_memory+0x11d/0x4c0
Aug 20 14:55:14 kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Aug 20 14:55:14 kernel: Out of memory: Kill process 4887 (cayley) score 965 or sacrifice child
Aug 20 14:55:15 kernel: oom_reaper: reaped process 4887 (cayley), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Is it possible to run the latest caley from the master without docker, or it will change nothing?
On another try received this:
fatal error: runtime: out of memory
runtime stack:
runtime.throw(0x150ab5f, 0x16)
/usr/local/go/src/runtime/panic.go:608 +0x72
runtime.sysMap(0xc68c000000, 0x3c000000, 0x22e1738)
/usr/local/go/src/runtime/mem_linux.go:156 +0xc7
runtime.(*mheap).sysAlloc(0x22c0640, 0x3c000000, 0x7f16c9bdaf80, 0x7f16a3013bc0)
/usr/local/go/src/runtime/malloc.go:619 +0x1c7
runtime.(*mheap).grow(0x22c0640, 0x1d6b2, 0x0)
/usr/local/go/src/runtime/mheap.go:920 +0x42
runtime.(*mheap).allocSpanLocked(0x22c0640, 0x1d6b2, 0x22e1748, 0x2030b500000000)
/usr/local/go/src/runtime/mheap.go:848 +0x337
runtime.(*mheap).alloc_m(0x22c0640, 0x1d6b2, 0xffffffffffff0100, 0x7f16a3013c78)
/usr/local/go/src/runtime/mheap.go:692 +0x119
runtime.(*mheap).alloc.func1()
/usr/local/go/src/runtime/mheap.go:759 +0x4c
runtime.(*mheap).alloc(0x22c0640, 0x1d6b2, 0x7f16a3010100, 0x418415)
/usr/local/go/src/runtime/mheap.go:758 +0x8a
runtime.largeAlloc(0x3ad64000, 0x450001, 0x7f1732460000)
/usr/local/go/src/runtime/malloc.go:1019 +0x97
runtime.mallocgc.func1()
/usr/local/go/src/runtime/malloc.go:914 +0x46
runtime.systemstack(0xc067a6fe00)
/usr/local/go/src/runtime/asm_amd64.s:351 +0x66
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1229
goroutine 1 [running]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:311 fp=0xc004d1efd0 sp=0xc004d1efc8 pc=0x459700
runtime.mallocgc(0x3ad64000, 0x141dca0, 0x1, 0xffffffffffffffff)
/usr/local/go/src/runtime/malloc.go:913 +0x896 fp=0xc004d1f070 sp=0xc004d1efd0 pc=0x40e2e6
runtime.growslice(0x141dca0, 0xc5d8d58000, 0xbc4700, 0xbc4700, 0xbc4701, 0x0, 0xc000170000, 0xc004d1f178)
/usr/local/go/src/runtime/slice.go:204 +0x145 fp=0xc004d1f0d8 sp=0xc004d1f070 pc=0x444445
github.com/boltdb/bolt.(*node).put(0xc0002483f0, 0xc2ad76f168, 0x8, 0x8, 0xc2ad76f168, 0x8, 0x8, 0xc462a3f120, 0x1b, 0x1b, ...)
/go/pkg/mod/github.com/boltdb/bolt@v1.3.1/node.go:131 +0x3dd fp=0xc004d1f188 sp=0xc004d1f0d8 pc=0x908bed
github.com/boltdb/bolt.(*Bucket).Put(0xc3cddf2a00, 0xc2ad76f160, 0x8, 0x8, 0xc462a3f120, 0x1b, 0x1b, 0x30, 0x12e4ae0)
/go/pkg/mod/github.com/boltdb/bolt@v1.3.1/bucket.go:309 +0x26d fp=0xc004d1f238 sp=0xc004d1f188 pc=0x8ff9ad
github.com/hidal-go/hidalgo/kv/bolt.(*Tx).Put(0xc03e41b620, 0xc2d68d9950, 0x2, 0x2, 0xc462a3f120, 0x1b, 0x1b, 0x1, 0x10f14ea)
/go/pkg/mod/github.com/hidal-go/hidalgo@v0.0.0-20190814174001-42e03f3b5eaa/kv/bolt/bolt.go:161 +0x13f fp=0xc004d1f2a0 sp=0xc004d1f238 pc=0x9125ff
github.com/cayleygraph/cayley/graph/kv.(*QuadStore).addToLog(0xc000281900, 0x174aa20, 0xc03e41b620, 0xc004d1f428, 0x2, 0x2)
/cayley/graph/kv/indexing.go:886 +0x245 fp=0xc004d1f358 sp=0xc004d1f2a0 pc=0xad8075
github.com/cayleygraph/cayley/graph/kv.(*QuadStore).indexLink(0xc000281900, 0x174aa20, 0xc03e41b620, 0xc004d1f428, 0x0, 0x0)
/cayley/graph/kv/indexing.go:636 +0x1a1 fp=0xc004d1f3d8 sp=0xc004d1f358 pc=0xad5701
github.com/cayleygraph/cayley/graph/kv.(*QuadStore).indexLinks(0xc000281900, 0x1744b20, 0xc0000b2028, 0x174aa20, 0xc03e41b620, 0xc4ec708000, 0x7a114, 0x7a120, 0x0, 0x0)
/cayley/graph/kv/indexing.go:614 +0x113 fp=0xc004d1f4e8 sp=0xc004d1f3d8 pc=0xad5483
github.com/cayleygraph/cayley/graph/kv.(*QuadStore).applyAddDeltas(0xc000281900, 0x174aa20, 0xc03e41b620, 0x0, 0x0, 0x0, 0xc13b3d0ea0, 0x22d0001, 0x173a2a0, 0xc32781cbe0, ...)
/cayley/graph/kv/indexing.go:484 +0x6e4 fp=0xc004d1f8e8 sp=0xc004d1f4e8 pc=0xad38f4
github.com/cayleygraph/cayley/graph/kv.(*quadWriter).WriteQuads(0xc004d30060, 0xc004d5a000, 0x7a120, 0x7a120, 0xc0000c00a0, 0xc004d5a000, 0x7a120)
/cayley/graph/kv/indexing.go:388 +0xa1 fp=0xc004d1f960 sp=0xc004d1f8e8 pc=0xad2dc1
github.com/cayleygraph/cayley/internal.(*batchLogger).WriteQuads(0xc0000c00c0, 0xc004d5a000, 0x7a120, 0x7a120, 0x7a120, 0x0, 0x0)
/cayley/internal/load.go:146 +0x59 fp=0xc004d1f9c8 sp=0xc004d1f960 pc=0x75fe09
github.com/cayleygraph/cayley/quad.CopyBatch(0x1731b40, 0xc0000c00c0, 0x7f172d9f8ae0, 0xc0000c00a0, 0x7a120, 0xc0000c00a0, 0x0, 0x0)
/cayley/quad/rw.go:145 +0x1dc fp=0xc004d1fa70 sp=0xc004d1f9c8 pc=0x746d1c
github.com/cayleygraph/cayley/internal.DecompressAndLoad(0x173bfa0, 0xc004d30060, 0x7a120, 0x7ffdb3958f78, 0x14, 0x0, 0x0, 0x0, 0x0)
/cayley/internal/load.go:133 +0x17c fp=0xc004d1fae8 sp=0xc004d1fa70 pc=0x75fc7c
github.com/cayleygraph/cayley/internal.Load(0x173bfa0, 0xc004d30060, 0x7a120, 0x7ffdb3958f78, 0x14, 0x0, 0x0, 0x14fc8ff, 0xa)
/cayley/internal/load.go:21 +0x71 fp=0xc004d1fb40 sp=0xc004d1fae8 pc=0x75ec31
github.com/cayleygraph/cayley/cmd/cayley/command.openForQueries(0xc00027cc80, 0x0, 0x0, 0x0)
/cayley/cmd/cayley/command/database.go:259 +0x2bb fp=0xc004d1fc48 sp=0xc004d1fb40 pc=0x8d2ceb
github.com/cayleygraph/cayley/cmd/cayley/command.NewHttpCmd.func1(0xc00027cc80, 0xc000100fd0, 0x0, 0xb, 0x0, 0x0)
/cayley/cmd/cayley/command/http.go:24 +0x9f fp=0xc004d1fce8 sp=0xc004d1fc48 pc=0x8d860f
github.com/spf13/cobra.(*Command).execute(0xc00027cc80, 0xc000100f20, 0xb, 0xb, 0xc00027cc80, 0xc000100f20)
/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:826 +0x473 fp=0xc004d1fdd8 sp=0xc004d1fce8 pc=0x5c0fa3
github.com/spf13/cobra.(*Command).ExecuteC(0x2124ae0, 0x2124d60, 0xc00023db80, 0xc000191f88)
/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x2f8 fp=0xc004d1ff20 sp=0xc004d1fdd8 pc=0x5c1998
github.com/spf13/cobra.(*Command).Execute(0x2124ae0, 0xc00027ca00, 0xc00027cc80)
/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864 +0x2b fp=0xc004d1ff50 sp=0xc004d1ff20 pc=0x5c167b
main.main()
/cayley/cmd/cayley/cayley.go:187 +0x31 fp=0xc004d1ff98 sp=0xc004d1ff50 pc=0x10c8061
runtime.main()
/usr/local/go/src/runtime/proc.go:201 +0x207 fp=0xc004d1ffe0 sp=0xc004d1ff98 pc=0x42f197
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc004d1ffe8 sp=0xc004d1ffe0 pc=0x45b7e1
goroutine 19 [syscall, 191 minutes]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
/usr/local/go/src/os/signal/signal_unix.go:29 +0x41
goroutine 20 [chan receive]:
github.com/golang/glog.(*loggingT).flushDaemon(0x22b7240)
/go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:882 +0x8b
created by github.com/golang/glog.init.0
/go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:410 +0x203
@eawer Thanks a lot for providing the stack trace and new test results! I can confirm that the new code path is used.
Can you please try setting --batch
to lower values? The new code path should behave differently with the same parameters, so the table in the first post may show a different optimal batch size. Setting batch too high will lead to OOM with certainty, since the whole transaction of that size will be buffered to memory by BoltDB.
In general, BoltDB does not perform well with random writes which happen frequently during import of a dataset of this size. You may also try switching to Badger instead for this import. You may need to set the --batch
to even lower values because the way transactions work in Badger, but the write performance should be much better. Note, however, that Badger backend is not fully tested yet.
Badger failed with such error:
I0821 08:20:19.862643 1 cayley.go:63] Cayley version: v0.7.x-dev (300dcf1f8503)
I0821 08:20:19.862868 1 cayley.go:76] using config file: etc/cayley.json
I0821 08:20:19.862959 1 database.go:193] using backend "badger" (/data/badger)
2019/08/21 08:20:19 Replaying from value pointer: {Fid:0 Len:0 Offset:0}
2019/08/21 08:20:19 Iterating file id: 0
2019/08/21 08:20:19 Iteration took: 23.881µs
2019/08/21 08:20:20 Replaying from value pointer: {Fid:0 Len:42 Offset:225}
2019/08/21 08:20:20 Iterating file id: 0
2019/08/21 08:20:20 Iteration took: 17.805µs
I0821 08:20:20.536147 1 load.go:149] Wrote 5000 quads.
I0821 08:20:20.604970 1 load.go:149] Wrote 10000 quads.
I0821 08:20:20.680754 1 load.go:149] Wrote 15000 quads.
I0821 08:20:20.752241 1 load.go:149] Wrote 20000 quads.
I0821 08:20:20.809708 1 load.go:149] Wrote 25000 quads.
I0821 08:20:20.838822 1 load.go:149] Wrote 25000 quads.
Error: db: failed to load data: Txn is too big to fit into one request
...
E0821 08:20:21.287193 1 cayley.go:188] db: failed to load data: Txn is too big to fit into one request
After replacing badger 1.5.5 with 1.6.0 docker build fails with
# github.com/hidal-go/hidalgo/kv/flat/badger
/go/pkg/mod/github.com/hidal-go/hidalgo@v0.0.0-20190814174001-42e03f3b5eaa/kv/flat/badger/badger.go:46:5: opt.Dir undefined (type func(string) badger.Options has no field or method Dir)
/go/pkg/mod/github.com/hidal-go/hidalgo@v0.0.0-20190814174001-42e03f3b5eaa/kv/flat/badger/badger.go:47:17: cannot use opt (type func(string) badger.Options) as type badger.Options in argument to Open
/go/pkg/mod/github.com/hidal-go/hidalgo@v0.0.0-20190814174001-42e03f3b5eaa/kv/flat/badger/badger.go:77:21: too many arguments in call to tx.tx.Commit
have (nil)
want ()
/go/pkg/mod/github.com/hidal-go/hidalgo@v0.0.0-20190814174001-42e03f3b5eaa/kv/flat/badger/badger.go:149:9: assignment mismatch: 2 variables but 1 values
/go/pkg/mod/github.com/hidal-go/hidalgo@v0.0.0-20190814174001-42e03f3b5eaa/kv/flat/badger/badger.go:149:30: not enough arguments in call to it.it.Item().Value
have ()
want (func([]byte) error)
Regarding smaller batches - I put it on test and will paste the results here right after I have them.
db: failed to load data: Txn is too big to fit into one request
This is expected, and this is exactly the reason why I suggested lowering --batch
even more for Badger. It may be way below the default of 10K.
Regarding 1.6, we can update Hidalgo later to support it.
Reducing batch size to 10k for bolt did not help a lot, process ended with no trace, because of OOM with something like this https://github.com/cayleygraph/cayley/issues/815#issuecomment-523063489 in the journal
Badger started to import only on 1k batch size, but after all crashed like this:
Thanks a lot for testing it @eawer. This definitely looks like a memory issue in the new code path. I will investigate it.
After a few short tests, I think the OOM is related to how the quad indexing works in all KV backends. The dataset of this size leads to an extremely large index entries for supernodes. Modifying this kind of index entry requires loading it into the memory.
A quick workaround may be to take a physical size of the transaction into account when auto-committing during batch import. However, this will only delay an issue.
The mid-term fix is to allow disabling those list indexes and switch to a single value indexes. This may lead to a larger DB size, but will solve the issue completely. Query performance may also suffer to some degree.
The long-term solution is to allow both kinds of indexes. This will complicate the design, however, and must be tested carefully in comparison with the previous proposed solution.
@eawer I made a few changes to indexing in https://github.com/cayleygraph/cayley/pull/816. It should help with the OOM issue, but at the same time it will increase the size of the index on disk, so write performance may suffer. But it's hard to make any statements because the new indexing strategy should also improve quad lookup performance, which may remove the need for multiple reads during the import process.
I will continue looking into it and will run a few tests locally as well. But the help with the testing is highly appreciated.
Cayley version: v0.7.x-dev (88cbd1564ad0)
badger died after 20min and 14M quads (batch size 1k) with Error: db: failed to load data: Txn is too big to fit into one request
Bolt failed after 13.5h and 207M quads with error:
fatal error: runtime: out of memory
runtime stack:
runtime.throw(0x1534adf, 0x16)
/usr/local/go/src/runtime/panic.go:608 +0x72
runtime.sysMap(0xc6c4000000, 0x4000000, 0x23407b8)
/usr/local/go/src/runtime/mem_linux.go:156 +0xc7
runtime.(*mheap).sysAlloc(0x231f6c0, 0x4000000, 0x231f8c8, 0x7fcff738c4d8)
/usr/local/go/src/runtime/malloc.go:619 +0x1c7
runtime.(*mheap).grow(0x231f6c0, 0x30, 0x0)
/usr/local/go/src/runtime/mheap.go:920 +0x42
runtime.(*mheap).allocSpanLocked(0x231f6c0, 0x30, 0x23407c8, 0x9800)
/usr/local/go/src/runtime/mheap.go:848 +0x337
runtime.(*mheap).alloc_m(0x231f6c0, 0x30, 0x101, 0x7fcfeaf6ffff)
/usr/local/go/src/runtime/mheap.go:692 +0x119
runtime.(*mheap).alloc.func1()
/usr/local/go/src/runtime/mheap.go:759 +0x4c
runtime.(*mheap).alloc(0x231f6c0, 0x30, 0x7fcfea000101, 0x7fd0284c9130)
/usr/local/go/src/runtime/mheap.go:758 +0x8a
runtime.largeAlloc(0x60000, 0x450100, 0x7fd0284c9130)
/usr/local/go/src/runtime/malloc.go:1019 +0x97
runtime.mallocgc.func1()
/usr/local/go/src/runtime/malloc.go:914 +0x46
runtime.systemstack(0xc000001380)
/usr/local/go/src/runtime/asm_amd64.s:351 +0x66
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1229
goroutine 1 [running]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:311 fp=0xc0000c91f8 sp=0xc0000c91f0 pc=0x459700
runtime.mallocgc(0x60000, 0x0, 0x3c000, 0xc6c3f74000)
/usr/local/go/src/runtime/malloc.go:913 +0x896 fp=0xc0000c9298 sp=0xc0000c91f8 pc=0x40e2e6
runtime.growslice(0x1318ee0, 0xc6c3f74000, 0x9800, 0x9800, 0x9801, 0xc6c3f74000, 0x7800, 0x9800)
/usr/local/go/src/runtime/slice.go:197 +0x219 fp=0xc0000c9300 sp=0xc0000c9298 pc=0x444519
github.com/cayleygraph/cayley/graph/kv.decodeIndex(0x7fcb35042040, 0x43f1f8, 0x43f1f8, 0xc1b8a96f30, 0x2, 0x2, 0xc1b8a96e70, 0x2)
/cayley/graph/kv/indexing.go:753 +0x131 fp=0xc0000c9378 sp=0xc0000c9300 pc=0xaece71
github.com/cayleygraph/cayley/graph/kv.(*QuadStore).getBucketIndexes(0xc000273900, 0x176f900, 0xc000038048, 0x1777160, 0xc3c1e8e008, 0xc1b8a96c30, 0x2, 0x2, 0x410029, 0x132a520, ...)
/cayley/graph/kv/indexing.go:720 +0x176 fp=0xc0000c9418 sp=0xc0000c9378 pc=0xaecab6
github.com/cayleygraph/cayley/graph/kv.(*QuadStore).hasPrimitive(0xc000273900, 0x176f900, 0xc000038048, 0x1777160, 0xc3c1e8e008, 0xc0000c9880, 0x968c42f20953f800, 0x837bace2, 0x0, 0x0)
/cayley/graph/kv/indexing.go:860 +0x216 fp=0xc0000c94e8 sp=0xc0000c9418 pc=0xaedd36
github.com/cayleygraph/cayley/graph/kv.(*QuadStore).applyAddDeltas(0xc000273900, 0x1777160, 0xc3c1e8e008, 0x0, 0x0, 0x0, 0xc1c0dc01e0, 0x2330001, 0x1765240, 0xc4aa122a80, ...)
/cayley/graph/kv/indexing.go:507 +0x94c fp=0xc0000c98e8 sp=0xc0000c94e8 pc=0xaea8ac
github.com/cayleygraph/cayley/graph/kv.(*quadWriter).WriteQuads(0xc0002753e0, 0xc004d26000, 0x186a0, 0x186a0, 0xc00000c0c0, 0xc004d26000, 0x186a0)
/cayley/graph/kv/indexing.go:439 +0xa1 fp=0xc0000c9960 sp=0xc0000c98e8 pc=0xae9b11
github.com/cayleygraph/cayley/internal.(*batchLogger).WriteQuads(0xc00000c0e0, 0xc004d26000, 0x186a0, 0x186a0, 0x186a0, 0x0, 0x0)
/cayley/internal/load.go:146 +0x59 fp=0xc0000c99c8 sp=0xc0000c9960 pc=0x75fe09
github.com/cayleygraph/cayley/quad.CopyBatch(0x175caa0, 0xc00000c0e0, 0x7fd080409068, 0xc00000c0c0, 0x186a0, 0xc00000c0c0, 0x0, 0x0)
/cayley/quad/rw.go:145 +0x1dc fp=0xc0000c9a70 sp=0xc0000c99c8 pc=0x746d1c
github.com/cayleygraph/cayley/internal.DecompressAndLoad(0x1766f40, 0xc0002753e0, 0x186a0, 0x7ffff4cc0f78, 0x14, 0x0, 0x0, 0x0, 0x0)
/cayley/internal/load.go:133 +0x17c fp=0xc0000c9ae8 sp=0xc0000c9a70 pc=0x75fc7c
github.com/cayleygraph/cayley/internal.Load(0x1766f40, 0xc0002753e0, 0x186a0, 0x7ffff4cc0f78, 0x14, 0x0, 0x0, 0x15266d9, 0xa)
/cayley/internal/load.go:21 +0x71 fp=0xc0000c9b40 sp=0xc0000c9ae8 pc=0x75ec31
github.com/cayleygraph/cayley/cmd/cayley/command.openForQueries(0xc000270c80, 0x0, 0x0, 0x0)
/cayley/cmd/cayley/command/database.go:259 +0x2bb fp=0xc0000c9c48 sp=0xc0000c9b40 pc=0x8e908b
github.com/cayleygraph/cayley/cmd/cayley/command.NewHttpCmd.func1(0xc000270c80, 0xc0000eefd0, 0x0, 0xb, 0x0, 0x0)
/cayley/cmd/cayley/command/http.go:24 +0x9f fp=0xc0000c9ce8 sp=0xc0000c9c48 pc=0x8ee9af
github.com/spf13/cobra.(*Command).execute(0xc000270c80, 0xc0000eef20, 0xb, 0xb, 0xc000270c80, 0xc0000eef20)
/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:826 +0x473 fp=0xc0000c9dd8 sp=0xc0000c9ce8 pc=0x5c0fa3
github.com/spf13/cobra.(*Command).ExecuteC(0x2183b40, 0x2183dc0, 0xc000231b80, 0xc000187f88)
/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x2f8 fp=0xc0000c9f20 sp=0xc0000c9dd8 pc=0x5c1998
github.com/spf13/cobra.(*Command).Execute(0x2183b40, 0xc000270a00, 0xc000270c80)
/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864 +0x2b fp=0xc0000c9f50 sp=0xc0000c9f20 pc=0x5c167b
main.main()
/cayley/cmd/cayley/cayley.go:187 +0x31 fp=0xc0000c9f98 sp=0xc0000c9f50 pc=0x10e9381
runtime.main()
/usr/local/go/src/runtime/proc.go:201 +0x207 fp=0xc0000c9fe0 sp=0xc0000c9f98 pc=0x42f197
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc0000c9fe8 sp=0xc0000c9fe0 pc=0x45b7e1
goroutine 5 [syscall, 834 minutes]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
/usr/local/go/src/os/signal/signal_unix.go:29 +0x41
goroutine 6 [chan receive]:
github.com/golang/glog.(*loggingT).flushDaemon(0x23162c0)
/go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:882 +0x8b
created by github.com/golang/glog.init.0
/go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:410 +0x203
Thanks for taking the time to test the new version!
badger died after 20min and 14M quads (batch size 1k) with
Error: db: failed to load data: Txn is too big to fit into one request
I think we may give up for now on Badger. I'll need to change the way how we import data into it to avoid those large transaction.
Bolt failed after 13.5h and 207M quads
We are getting somewhere. But it seems like the SP index may also build up over time. I wonder what in the Freebase schema may cause it? I guess we will find out after a successful import :)
In any case, I will get back to you after making a few more changes.
First, I want to add instrumentation, so me can get more insights from the import process. I will add a "time per batch" metric, so we can get the same graph as you've built from Prometheus/Grafana directly. Also, I'm particularly interested in seeing the sizes of those index entries on each index. Also, things like lookup durations may be useful to speed up the import further.
And second, I will add an option to set custom indexes for KV, so we can try "SPO" index instead of "SP". Since the import was able to proceed further than before, I think the "O" index was indeed the case for previous OOM.
@eawer The patch with metrics and improved write performance was pushed upstream. It now optimizes the first batch write (e.g. cayley load
) to minimize reads from the disk.
However, as I mentioned before, this improves performance only to some extent. I'm now working on a two-stage import tool that should improve write speed significantly. It will first use Badger to build a node index, and then export it to the target backend (e.g. Bolt).
From the first prototype, metrics show that the import process speed is constant and doesn't degrade over time, thanks to Badger's write optimizations. Building nodes index for the whole dataset takes < 10h. I will post precise test results later.
Just happened upon this issue. Have you guys considered using WriteBatch for Badger? It can deal with txn sizes and automatically commit them as needed.
@manishrjain Thanks for a suggestion, we will definitely consider that API :)
The problem is not with Badger specifically, as I mentioned. It's a problem with Cayley's legacy write API which tries to be suitable both for batch uploads and regular transactions. The new importer can take advantage of a WriteBatch in Badger directly and it work really well so far!
For info on my system
5 X the size of the original data seems a little big.
@hubyhuby Note that Bolt pre-allocates some space, so not all of it is actually used by the database.
Just a follow up, but does now work to import freebase on Cayley?
Description I'm trying to load full Freebase dump into cayley, but load performance is degrading, and it looks like the load process will never finish.
At first I've run a series of experiments to determine the fastest way to load the data. Here are the results (cells contain minutes needed to load the number of quads from the column header). As we can see
Bolt + pq + nosync + 2.5M
had the best performance (not sure though that nosync contributed at all)Steps to reproduce the issue:
cayley load -c bolt.yml --verbose=3 -i freebase.pq
Received results: At first, everything was ok, but then load started to slow down. The graph can demonstrate it better:
Then I've decided to make smaller batches and add
nosync
:cayley load -c bolt.yml --verbose=3 -i freebase.pq
Things became a bit better, but not for long:
htop
says the process consumes more and more memory (58gb after 2 days)Expected results: Freebase loaded in less than infinity
Output of
cayley version
or commit hash:Environment details: CPU: 8 x Intel(R) Xeon(R) CPU @ 2.30GHz Memory: 29Gb OS: Ubuntu 16.04.6 LTS Disk: SSD
Backend database:
(database and version)
Bolt
, not sure about version - cayley handled that for meSo the question is - am I doing it wrong or is there some bug?