cayleygraph / cayley

An open-source graph database
https://cayley.io
Apache License 2.0
14.84k stars 1.25k forks source link

bolt: Performance degradation while loading Freebase dump #815

Open eawer opened 5 years ago

eawer commented 5 years ago

Description I'm trying to load full Freebase dump into cayley, but load performance is degrading, and it looks like the load process will never finish.

At first I've run a series of experiments to determine the fastest way to load the data. Here are the results (cells contain minutes needed to load the number of quads from the column header). As we can see Bolt + pq + nosync + 2.5M had the best performance (not sure though that nosync contributed at all)

  5M 10M 15M 20M 25M
Bolt + nq + 10k 2 7 16 20 29
Bolt + nq + 50k 2 5 13 15 23
Bolt + nq + 100k 2 5 12 14 20
Bolt + nq + 200k 2 5 10 12 17
Bolt + nq + 500k 2 5 8 10 13
Bolt + nq + 500k 2 5 8 9 12
Bolt + nq + 1.25M 2 5 7 9 12
Bolt + nq + 2.5M 2 5 7 9 12
Bolt + nq + 5M 3 5 8 9 12
Bolt + nq + 1M + nosync 2 5 7 9 12
Bolt + nq + 2.5M + nosync 2 5 7 9 11
Bolt + pq.gz + 1.25M 2 4 7 8 10
Bolt + pq + nosync + 1.25M 2 4 6 7 10
Bolt + pq + nosync + 2.5M 2 4 6 7 9
Leveldb + nq + buffer 20 + 10k 4 12 26    
Leveldb + pq.gz + buffer 20M + 1.25M 1 8 16 18 27
Leveldb + nq + buffer 20M + 5M 2 18      
Leveldb + pq.gz + buffer 200M + 1.25M 1 8 16 18 27
Leveldb + pq.gz + buffer 1G + 1.25M 1 8 16 18 27
Leveldb + pq.gz + buffer 1G + 500k 1 5 11 13 19
Leveldb + pq.gz + buffer 4G + 500k 1 5 11 13 19
Leveldb + pq.gz + buffer 4G + 1.25M 1 8 16 18 27
Leveldb + pq.gz + buffer 4G + cache 200M + 1.25M 1 8 16 18 28

Steps to reproduce the issue:

  1. cayley load -c bolt.yml --verbose=3 -i freebase.pq
  2. bolt.yml:
    store:
    backend: bolt
    address: bolt
    load:
    batch: 2500000

    Received results: At first, everything was ok, but then load started to slow down. The graph can demonstrate it better: image

Then I've decided to make smaller batches and add nosync:

  1. cayley load -c bolt.yml --verbose=3 -i freebase.pq
  2. bolt.yml:
    store:
    backend: bolt
    address: bolt
    options:
    nosync: true
    load:
    batch: 500000

    Things became a bit better, but not for long: image

htop says the process consumes more and more memory (58gb after 2 days)

Expected results: Freebase loaded in less than infinity

Output of cayley version or commit hash:

Cayley version: 0.7.5
Git commit hash: cf576babb7db

Environment details: CPU: 8 x Intel(R) Xeon(R) CPU @ 2.30GHz Memory: 29Gb OS: Ubuntu 16.04.6 LTS Disk: SSD

Backend database: (database and version) Bolt, not sure about version - cayley handled that for me

So the question is - am I doing it wrong or is there some bug?

dennwc commented 5 years ago

Yes, this can be considered a performance-related bug in the import implementation.

Can you please try 79d7d62393142cfd7035e942cc705f6947e7bcbb? It should improve import performance considerably.

But we may need other changes as well to make it faster for those large dumps specifically. For example, a two-stage import process may be faster because we can pre-compute nodes table during the first pass and then resolve quads during the second pass, so the nodes table is mostly in read-only mode during the second pass.

eawer commented 5 years ago

I tried to load data using the latest master

Cayley version: v0.7.x-dev
Git commit hash: 300dcf1f8503

(I believe the commit you've referenced is already in there), and probably things became a bit better, but the pattern seems to be the same: image Also, the docker was killed with 137 code (OOM) in 1.5h (98M quads was processed)

sudo journalctl -k | grep -i -e memory -e oom
Aug 20 14:55:14 kernel: cayley invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
Aug 20 14:55:14 kernel:  oom_kill_process+0x22e/0x450
Aug 20 14:55:14 kernel:  out_of_memory+0x11d/0x4c0
Aug 20 14:55:14 kernel: [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Aug 20 14:55:14 kernel: Out of memory: Kill process 4887 (cayley) score 965 or sacrifice child
Aug 20 14:55:15 kernel: oom_reaper: reaped process 4887 (cayley), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Is it possible to run the latest caley from the master without docker, or it will change nothing?

eawer commented 5 years ago

On another try received this:

fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x150ab5f, 0x16)
    /usr/local/go/src/runtime/panic.go:608 +0x72
runtime.sysMap(0xc68c000000, 0x3c000000, 0x22e1738)
    /usr/local/go/src/runtime/mem_linux.go:156 +0xc7
runtime.(*mheap).sysAlloc(0x22c0640, 0x3c000000, 0x7f16c9bdaf80, 0x7f16a3013bc0)
    /usr/local/go/src/runtime/malloc.go:619 +0x1c7
runtime.(*mheap).grow(0x22c0640, 0x1d6b2, 0x0)
    /usr/local/go/src/runtime/mheap.go:920 +0x42
runtime.(*mheap).allocSpanLocked(0x22c0640, 0x1d6b2, 0x22e1748, 0x2030b500000000)
    /usr/local/go/src/runtime/mheap.go:848 +0x337
runtime.(*mheap).alloc_m(0x22c0640, 0x1d6b2, 0xffffffffffff0100, 0x7f16a3013c78)
    /usr/local/go/src/runtime/mheap.go:692 +0x119
runtime.(*mheap).alloc.func1()
    /usr/local/go/src/runtime/mheap.go:759 +0x4c
runtime.(*mheap).alloc(0x22c0640, 0x1d6b2, 0x7f16a3010100, 0x418415)
    /usr/local/go/src/runtime/mheap.go:758 +0x8a
runtime.largeAlloc(0x3ad64000, 0x450001, 0x7f1732460000)
    /usr/local/go/src/runtime/malloc.go:1019 +0x97
runtime.mallocgc.func1()
    /usr/local/go/src/runtime/malloc.go:914 +0x46
runtime.systemstack(0xc067a6fe00)
    /usr/local/go/src/runtime/asm_amd64.s:351 +0x66
runtime.mstart()
    /usr/local/go/src/runtime/proc.go:1229

goroutine 1 [running]:
runtime.systemstack_switch()
    /usr/local/go/src/runtime/asm_amd64.s:311 fp=0xc004d1efd0 sp=0xc004d1efc8 pc=0x459700
runtime.mallocgc(0x3ad64000, 0x141dca0, 0x1, 0xffffffffffffffff)
    /usr/local/go/src/runtime/malloc.go:913 +0x896 fp=0xc004d1f070 sp=0xc004d1efd0 pc=0x40e2e6
runtime.growslice(0x141dca0, 0xc5d8d58000, 0xbc4700, 0xbc4700, 0xbc4701, 0x0, 0xc000170000, 0xc004d1f178)
    /usr/local/go/src/runtime/slice.go:204 +0x145 fp=0xc004d1f0d8 sp=0xc004d1f070 pc=0x444445
github.com/boltdb/bolt.(*node).put(0xc0002483f0, 0xc2ad76f168, 0x8, 0x8, 0xc2ad76f168, 0x8, 0x8, 0xc462a3f120, 0x1b, 0x1b, ...)
    /go/pkg/mod/github.com/boltdb/bolt@v1.3.1/node.go:131 +0x3dd fp=0xc004d1f188 sp=0xc004d1f0d8 pc=0x908bed
github.com/boltdb/bolt.(*Bucket).Put(0xc3cddf2a00, 0xc2ad76f160, 0x8, 0x8, 0xc462a3f120, 0x1b, 0x1b, 0x30, 0x12e4ae0)
    /go/pkg/mod/github.com/boltdb/bolt@v1.3.1/bucket.go:309 +0x26d fp=0xc004d1f238 sp=0xc004d1f188 pc=0x8ff9ad
github.com/hidal-go/hidalgo/kv/bolt.(*Tx).Put(0xc03e41b620, 0xc2d68d9950, 0x2, 0x2, 0xc462a3f120, 0x1b, 0x1b, 0x1, 0x10f14ea)
    /go/pkg/mod/github.com/hidal-go/hidalgo@v0.0.0-20190814174001-42e03f3b5eaa/kv/bolt/bolt.go:161 +0x13f fp=0xc004d1f2a0 sp=0xc004d1f238 pc=0x9125ff
github.com/cayleygraph/cayley/graph/kv.(*QuadStore).addToLog(0xc000281900, 0x174aa20, 0xc03e41b620, 0xc004d1f428, 0x2, 0x2)
    /cayley/graph/kv/indexing.go:886 +0x245 fp=0xc004d1f358 sp=0xc004d1f2a0 pc=0xad8075
github.com/cayleygraph/cayley/graph/kv.(*QuadStore).indexLink(0xc000281900, 0x174aa20, 0xc03e41b620, 0xc004d1f428, 0x0, 0x0)
    /cayley/graph/kv/indexing.go:636 +0x1a1 fp=0xc004d1f3d8 sp=0xc004d1f358 pc=0xad5701
github.com/cayleygraph/cayley/graph/kv.(*QuadStore).indexLinks(0xc000281900, 0x1744b20, 0xc0000b2028, 0x174aa20, 0xc03e41b620, 0xc4ec708000, 0x7a114, 0x7a120, 0x0, 0x0)
    /cayley/graph/kv/indexing.go:614 +0x113 fp=0xc004d1f4e8 sp=0xc004d1f3d8 pc=0xad5483
github.com/cayleygraph/cayley/graph/kv.(*QuadStore).applyAddDeltas(0xc000281900, 0x174aa20, 0xc03e41b620, 0x0, 0x0, 0x0, 0xc13b3d0ea0, 0x22d0001, 0x173a2a0, 0xc32781cbe0, ...)
    /cayley/graph/kv/indexing.go:484 +0x6e4 fp=0xc004d1f8e8 sp=0xc004d1f4e8 pc=0xad38f4
github.com/cayleygraph/cayley/graph/kv.(*quadWriter).WriteQuads(0xc004d30060, 0xc004d5a000, 0x7a120, 0x7a120, 0xc0000c00a0, 0xc004d5a000, 0x7a120)
    /cayley/graph/kv/indexing.go:388 +0xa1 fp=0xc004d1f960 sp=0xc004d1f8e8 pc=0xad2dc1
github.com/cayleygraph/cayley/internal.(*batchLogger).WriteQuads(0xc0000c00c0, 0xc004d5a000, 0x7a120, 0x7a120, 0x7a120, 0x0, 0x0)
    /cayley/internal/load.go:146 +0x59 fp=0xc004d1f9c8 sp=0xc004d1f960 pc=0x75fe09
github.com/cayleygraph/cayley/quad.CopyBatch(0x1731b40, 0xc0000c00c0, 0x7f172d9f8ae0, 0xc0000c00a0, 0x7a120, 0xc0000c00a0, 0x0, 0x0)
    /cayley/quad/rw.go:145 +0x1dc fp=0xc004d1fa70 sp=0xc004d1f9c8 pc=0x746d1c
github.com/cayleygraph/cayley/internal.DecompressAndLoad(0x173bfa0, 0xc004d30060, 0x7a120, 0x7ffdb3958f78, 0x14, 0x0, 0x0, 0x0, 0x0)
    /cayley/internal/load.go:133 +0x17c fp=0xc004d1fae8 sp=0xc004d1fa70 pc=0x75fc7c
github.com/cayleygraph/cayley/internal.Load(0x173bfa0, 0xc004d30060, 0x7a120, 0x7ffdb3958f78, 0x14, 0x0, 0x0, 0x14fc8ff, 0xa)
    /cayley/internal/load.go:21 +0x71 fp=0xc004d1fb40 sp=0xc004d1fae8 pc=0x75ec31
github.com/cayleygraph/cayley/cmd/cayley/command.openForQueries(0xc00027cc80, 0x0, 0x0, 0x0)
    /cayley/cmd/cayley/command/database.go:259 +0x2bb fp=0xc004d1fc48 sp=0xc004d1fb40 pc=0x8d2ceb
github.com/cayleygraph/cayley/cmd/cayley/command.NewHttpCmd.func1(0xc00027cc80, 0xc000100fd0, 0x0, 0xb, 0x0, 0x0)
    /cayley/cmd/cayley/command/http.go:24 +0x9f fp=0xc004d1fce8 sp=0xc004d1fc48 pc=0x8d860f
github.com/spf13/cobra.(*Command).execute(0xc00027cc80, 0xc000100f20, 0xb, 0xb, 0xc00027cc80, 0xc000100f20)
    /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:826 +0x473 fp=0xc004d1fdd8 sp=0xc004d1fce8 pc=0x5c0fa3
github.com/spf13/cobra.(*Command).ExecuteC(0x2124ae0, 0x2124d60, 0xc00023db80, 0xc000191f88)
    /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x2f8 fp=0xc004d1ff20 sp=0xc004d1fdd8 pc=0x5c1998
github.com/spf13/cobra.(*Command).Execute(0x2124ae0, 0xc00027ca00, 0xc00027cc80)
    /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864 +0x2b fp=0xc004d1ff50 sp=0xc004d1ff20 pc=0x5c167b
main.main()
    /cayley/cmd/cayley/cayley.go:187 +0x31 fp=0xc004d1ff98 sp=0xc004d1ff50 pc=0x10c8061
runtime.main()
    /usr/local/go/src/runtime/proc.go:201 +0x207 fp=0xc004d1ffe0 sp=0xc004d1ff98 pc=0x42f197
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc004d1ffe8 sp=0xc004d1ffe0 pc=0x45b7e1

goroutine 19 [syscall, 191 minutes]:
os/signal.signal_recv(0x0)
    /usr/local/go/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
    /usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
    /usr/local/go/src/os/signal/signal_unix.go:29 +0x41

goroutine 20 [chan receive]:
github.com/golang/glog.(*loggingT).flushDaemon(0x22b7240)
    /go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:882 +0x8b
created by github.com/golang/glog.init.0
    /go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:410 +0x203
dennwc commented 5 years ago

@eawer Thanks a lot for providing the stack trace and new test results! I can confirm that the new code path is used.

Can you please try setting --batch to lower values? The new code path should behave differently with the same parameters, so the table in the first post may show a different optimal batch size. Setting batch too high will lead to OOM with certainty, since the whole transaction of that size will be buffered to memory by BoltDB.

In general, BoltDB does not perform well with random writes which happen frequently during import of a dataset of this size. You may also try switching to Badger instead for this import. You may need to set the --batch to even lower values because the way transactions work in Badger, but the write performance should be much better. Note, however, that Badger backend is not fully tested yet.

eawer commented 5 years ago

Badger failed with such error:

I0821 08:20:19.862643       1 cayley.go:63] Cayley version: v0.7.x-dev (300dcf1f8503)
I0821 08:20:19.862868       1 cayley.go:76] using config file: etc/cayley.json
I0821 08:20:19.862959       1 database.go:193] using backend "badger" (/data/badger)
2019/08/21 08:20:19 Replaying from value pointer: {Fid:0 Len:0 Offset:0}
2019/08/21 08:20:19 Iterating file id: 0
2019/08/21 08:20:19 Iteration took: 23.881µs
2019/08/21 08:20:20 Replaying from value pointer: {Fid:0 Len:42 Offset:225}
2019/08/21 08:20:20 Iterating file id: 0
2019/08/21 08:20:20 Iteration took: 17.805µs
I0821 08:20:20.536147       1 load.go:149] Wrote 5000 quads.
I0821 08:20:20.604970       1 load.go:149] Wrote 10000 quads.
I0821 08:20:20.680754       1 load.go:149] Wrote 15000 quads.
I0821 08:20:20.752241       1 load.go:149] Wrote 20000 quads.
I0821 08:20:20.809708       1 load.go:149] Wrote 25000 quads.
I0821 08:20:20.838822       1 load.go:149] Wrote 25000 quads.
Error: db: failed to load data: Txn is too big to fit into one request
...
E0821 08:20:21.287193       1 cayley.go:188] db: failed to load data: Txn is too big to fit into one request

After replacing badger 1.5.5 with 1.6.0 docker build fails with

# github.com/hidal-go/hidalgo/kv/flat/badger
/go/pkg/mod/github.com/hidal-go/hidalgo@v0.0.0-20190814174001-42e03f3b5eaa/kv/flat/badger/badger.go:46:5: opt.Dir undefined (type func(string) badger.Options has no field or method Dir)
/go/pkg/mod/github.com/hidal-go/hidalgo@v0.0.0-20190814174001-42e03f3b5eaa/kv/flat/badger/badger.go:47:17: cannot use opt (type func(string) badger.Options) as type badger.Options in argument to Open
/go/pkg/mod/github.com/hidal-go/hidalgo@v0.0.0-20190814174001-42e03f3b5eaa/kv/flat/badger/badger.go:77:21: too many arguments in call to tx.tx.Commit
    have (nil)
    want ()
/go/pkg/mod/github.com/hidal-go/hidalgo@v0.0.0-20190814174001-42e03f3b5eaa/kv/flat/badger/badger.go:149:9: assignment mismatch: 2 variables but 1 values
/go/pkg/mod/github.com/hidal-go/hidalgo@v0.0.0-20190814174001-42e03f3b5eaa/kv/flat/badger/badger.go:149:30: not enough arguments in call to it.it.Item().Value
    have ()
    want (func([]byte) error)

Regarding smaller batches - I put it on test and will paste the results here right after I have them.

dennwc commented 5 years ago
db: failed to load data: Txn is too big to fit into one request

This is expected, and this is exactly the reason why I suggested lowering --batch even more for Badger. It may be way below the default of 10K.

Regarding 1.6, we can update Hidalgo later to support it.

eawer commented 5 years ago

Reducing batch size to 10k for bolt did not help a lot, process ended with no trace, because of OOM with something like this https://github.com/cayleygraph/cayley/issues/815#issuecomment-523063489 in the journal image

eawer commented 5 years ago

Badger started to import only on 1k batch size, but after all crashed like this:

Long trace ``` I0821 13:46:05.629259 1 load.go:149] Wrote 91399000 quads. fatal error: runtime: out of memory runtime stack: runtime.throw(0x150ab5f, 0x16) /usr/local/go/src/runtime/panic.go:608 +0x72 runtime.sysMap(0xc6e4000000, 0x8000000, 0x22e1738) /usr/local/go/src/runtime/mem_linux.go:156 +0xc7 runtime.(*mheap).sysAlloc(0x22c0640, 0x8000000, 0x0, 0x7fb6928a7b98) /usr/local/go/src/runtime/malloc.go:619 +0x1c7 runtime.(*mheap).grow(0x22c0640, 0x20fa, 0x0) /usr/local/go/src/runtime/mheap.go:920 +0x42 runtime.(*mheap).allocSpanLocked(0x22c0640, 0x20fa, 0x22e1748, 0x0) /usr/local/go/src/runtime/mheap.go:848 +0x337 runtime.(*mheap).alloc_m(0x22c0640, 0x20fa, 0x101, 0x1871dd8) /usr/local/go/src/runtime/mheap.go:692 +0x119 runtime.(*mheap).alloc.func1() /usr/local/go/src/runtime/mheap.go:759 +0x4c runtime.(*mheap).alloc(0x22c0640, 0x20fa, 0x7fb692010101, 0x43694f) /usr/local/go/src/runtime/mheap.go:758 +0x8a runtime.largeAlloc(0x41f3ce6, 0xc000040101, 0xc03cec4180) /usr/local/go/src/runtime/malloc.go:1019 +0x97 runtime.mallocgc.func1() /usr/local/go/src/runtime/malloc.go:914 +0x46 runtime.systemstack(0xc000000f00) /usr/local/go/src/runtime/asm_amd64.s:351 +0x66 runtime.mstart() /usr/local/go/src/runtime/proc.go:1229 goroutine 131304 [running]: runtime.systemstack_switch() /usr/local/go/src/runtime/asm_amd64.s:311 fp=0xc102bc8c90 sp=0xc102bc8c88 pc=0x459700 runtime.mallocgc(0x41f3ce6, 0x12f6c20, 0xc094429801, 0xa8) /usr/local/go/src/runtime/malloc.go:913 +0x896 fp=0xc102bc8d30 sp=0xc102bc8c90 pc=0x40e2e6 runtime.makeslice(0x12f6c20, 0x41f3ce6, 0x41f3ce6, 0x7fb6952da000, 0x0, 0xc102bc8dd8) /usr/local/go/src/runtime/slice.go:70 +0x77 fp=0xc102bc8d60 sp=0xc102bc8d30 pc=0x444277 github.com/dgraph-io/badger/table.(*Table).loadToRAM(0xc01fb1ee00, 0x41f3ce6, 0x7a6) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/table/table.go:351 +0x4a fp=0xc102bc8de8 sp=0xc102bc8d60 pc=0x9757fa github.com/dgraph-io/badger/table.OpenTable(0xc012a122a8, 0x1, 0x0, 0x0, 0x0) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/table/table.go:139 +0x374 fp=0xc102bc8eb0 sp=0xc102bc8de8 pc=0x9743e4 github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x7a6, 0xc0001320e0, 0xc3f4f19080, 0xc114e604e0) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:461 +0x22f fp=0xc102bc8fc0 sp=0xc102bc8eb0 pc=0x99e9cf runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc102bc8fc8 sp=0xc102bc8fc0 pc=0x45b7e1 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 1 [running]: goroutine running on other thread; stack unavailable goroutine 19 [syscall, 79 minutes]: os/signal.signal_recv(0x0) /usr/local/go/src/runtime/sigqueue.go:139 +0x9c os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:23 +0x22 created by os/signal.init.0 /usr/local/go/src/os/signal/signal_unix.go:29 +0x41 goroutine 20 [chan receive]: github.com/golang/glog.(*loggingT).flushDaemon(0x22b7240) /go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:882 +0x8b created by github.com/golang/glog.init.0 /go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:410 +0x203 goroutine 21 [chan receive, 79 minutes]: github.com/dgraph-io/badger/y.(*WaterMark).process(0xc000274c98) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/y/watermark.go:125 +0x15b created by github.com/dgraph-io/badger/y.(*WaterMark).Init /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/y/watermark.go:54 +0xc1 goroutine 214 [chan receive]: github.com/dgraph-io/badger/y.(*WaterMark).process(0xc000274b18) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/y/watermark.go:125 +0x15b created by github.com/dgraph-io/badger/y.(*WaterMark).Init /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/y/watermark.go:54 +0xc1 goroutine 128689 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x77c, 0xc0001320e0, 0xc2b7dcc000, 0xc251d95da0) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 128945 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x788, 0xc0001320e0, 0xc2b7dcc000, 0xc0784fd9e0) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 215 [select]: github.com/dgraph-io/badger.(*DB).updateSize(0xc000170000, 0xc005632540) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/db.go:928 +0x125 created by github.com/dgraph-io/badger.Open /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/db.go:267 +0x7d7 goroutine 28 [select]: github.com/dgraph-io/badger.(*levelsController).runWorker(0xc0001320e0, 0xc00024c040) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:248 +0x151 created by github.com/dgraph-io/badger.(*levelsController).startCompact /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:233 +0x82 goroutine 27 [chan receive]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables(0xc0001320e0, 0x2, 0x174ad20, 0xc000182d00, 0xc00564b680, 0xc00564b6e0, 0xc08c20de68, 0x1, 0x1, 0xc08c20de70, ...) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:472 +0x1103 github.com/dgraph-io/badger.(*levelsController).runCompactDef(0xc0001320e0, 0x2, 0x174ad20, 0xc000182d00, 0xc00564b680, 0xc00564b6e0, 0xc08c20de68, 0x1, 0x1, 0xc08c20de70, ...) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:638 +0xd3 github.com/dgraph-io/badger.(*levelsController).doCompact(0xc0001320e0, 0x2, 0x3ff00cadb9666666, 0x0, 0x0, 0x0) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:705 +0x498 github.com/dgraph-io/badger.(*levelsController).runWorker(0xc0001320e0, 0xc00024c040) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:254 +0x1c9 created by github.com/dgraph-io/badger.(*levelsController).startCompact /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:233 +0x82 goroutine 29 [running]: goroutine running on other thread; stack unavailable created by github.com/dgraph-io/badger.(*levelsController).startCompact /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:233 +0x82 goroutine 33 [chan receive, 79 minutes]: github.com/dgraph-io/badger.(*valueLog).waitOnGC(0xc000170148, 0xc00024c4a0) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/value.go:1186 +0x61 created by github.com/dgraph-io/badger.Open /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/db.go:328 +0xcde goroutine 30 [chan receive, 1 minutes]: github.com/dgraph-io/badger.(*DB).flushMemtable(0xc000170000, 0xc00024c140, 0x0, 0x0) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/db.go:806 +0xd3 created by github.com/dgraph-io/badger.Open /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/db.go:280 +0xf3e goroutine 32 [select]: github.com/dgraph-io/badger.(*DB).doWrites(0xc000170000, 0xc00024c480) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/db.go:670 +0x404 created by github.com/dgraph-io/badger.Open /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/db.go:325 +0xc81 goroutine 130260 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x798, 0xc0001320e0, 0xc2b7dcc000, 0xc296d15020) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 130056 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x78a, 0xc0001320e0, 0xc2b7dcc000, 0xc01c1aaf60) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 131302 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x7a3, 0xc0001320e0, 0xc2b7dcc000, 0xc113d06060) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 130249 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x790, 0xc0001320e0, 0xc2b7dcc000, 0xc020507aa0) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 127571 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x775, 0xc0001320e0, 0xc2b7dcc000, 0xc296d14120) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 131303 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x7a5, 0xc0001320e0, 0xc2b7dcc000, 0xc094429800) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 130251 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x79c, 0xc0001320e0, 0xc2b7dcc000, 0xc02c45c5a0) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 128293 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x771, 0xc0001320e0, 0xc2b7dcc000, 0xc296d156e0) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 130338 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x78d, 0xc0001320e0, 0xc2b7dcc000, 0xc05b4bc060) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 129072 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x77e, 0xc0001320e0, 0xc2b7dcc000, 0xc114e61740) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 131176 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x7a0, 0xc0001320e0, 0xc2b7dcc000, 0xc1b16205a0) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 128151 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x773, 0xc0001320e0, 0xc2b7dcc000, 0xc118c22fc0) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 128624 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x780, 0xc0001320e0, 0xc2b7dcc000, 0xc010d74000) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 131055 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x79a, 0xc0001320e0, 0xc2b7dcc000, 0xc118c22000) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 131057 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x79e, 0xc0001320e0, 0xc2b7dcc000, 0xc010585320) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 130259 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x792, 0xc0001320e0, 0xc2b7dcc000, 0xc113d076e0) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 128222 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x76f, 0xc0001320e0, 0xc2b7dcc000, 0xc0c16bc2a0) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 128495 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x77a, 0xc0001320e0, 0xc2b7dcc000, 0xc113d07440) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 131238 [running]: goroutine running on other thread; stack unavailable created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 129196 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x785, 0xc0001320e0, 0xc2b7dcc000, 0xc06f42d0e0) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 130596 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x795, 0xc0001320e0, 0xc2b7dcc000, 0xc251d95080) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 128941 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x783, 0xc0001320e0, 0xc2b7dcc000, 0xc00cd0f8c0) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 goroutine 128493 [chan send]: github.com/dgraph-io/badger.(*levelsController).compactBuildTables.func2(0x777, 0xc0001320e0, 0xc2b7dcc000, 0xc1b1621b60) /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:463 +0x353 created by github.com/dgraph-io/badger.(*levelsController).compactBuildTables /go/pkg/mod/github.com/dgraph-io/badger@v1.5.5/levels.go:447 +0xec3 ```
dennwc commented 5 years ago

Thanks a lot for testing it @eawer. This definitely looks like a memory issue in the new code path. I will investigate it.

dennwc commented 5 years ago

After a few short tests, I think the OOM is related to how the quad indexing works in all KV backends. The dataset of this size leads to an extremely large index entries for supernodes. Modifying this kind of index entry requires loading it into the memory.

A quick workaround may be to take a physical size of the transaction into account when auto-committing during batch import. However, this will only delay an issue.

The mid-term fix is to allow disabling those list indexes and switch to a single value indexes. This may lead to a larger DB size, but will solve the issue completely. Query performance may also suffer to some degree.

The long-term solution is to allow both kinds of indexes. This will complicate the design, however, and must be tested carefully in comparison with the previous proposed solution.

dennwc commented 5 years ago

@eawer I made a few changes to indexing in https://github.com/cayleygraph/cayley/pull/816. It should help with the OOM issue, but at the same time it will increase the size of the index on disk, so write performance may suffer. But it's hard to make any statements because the new indexing strategy should also improve quad lookup performance, which may remove the need for multiple reads during the import process.

I will continue looking into it and will run a few tests locally as well. But the help with the testing is highly appreciated.

eawer commented 5 years ago

Cayley version: v0.7.x-dev (88cbd1564ad0)

badger died after 20min and 14M quads (batch size 1k) with Error: db: failed to load data: Txn is too big to fit into one request

Bolt failed after 13.5h and 207M quads with error:

fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x1534adf, 0x16)
    /usr/local/go/src/runtime/panic.go:608 +0x72
runtime.sysMap(0xc6c4000000, 0x4000000, 0x23407b8)
    /usr/local/go/src/runtime/mem_linux.go:156 +0xc7
runtime.(*mheap).sysAlloc(0x231f6c0, 0x4000000, 0x231f8c8, 0x7fcff738c4d8)
    /usr/local/go/src/runtime/malloc.go:619 +0x1c7
runtime.(*mheap).grow(0x231f6c0, 0x30, 0x0)
    /usr/local/go/src/runtime/mheap.go:920 +0x42
runtime.(*mheap).allocSpanLocked(0x231f6c0, 0x30, 0x23407c8, 0x9800)
    /usr/local/go/src/runtime/mheap.go:848 +0x337
runtime.(*mheap).alloc_m(0x231f6c0, 0x30, 0x101, 0x7fcfeaf6ffff)
    /usr/local/go/src/runtime/mheap.go:692 +0x119
runtime.(*mheap).alloc.func1()
    /usr/local/go/src/runtime/mheap.go:759 +0x4c
runtime.(*mheap).alloc(0x231f6c0, 0x30, 0x7fcfea000101, 0x7fd0284c9130)
    /usr/local/go/src/runtime/mheap.go:758 +0x8a
runtime.largeAlloc(0x60000, 0x450100, 0x7fd0284c9130)
    /usr/local/go/src/runtime/malloc.go:1019 +0x97
runtime.mallocgc.func1()
    /usr/local/go/src/runtime/malloc.go:914 +0x46
runtime.systemstack(0xc000001380)
    /usr/local/go/src/runtime/asm_amd64.s:351 +0x66
runtime.mstart()
    /usr/local/go/src/runtime/proc.go:1229

goroutine 1 [running]:
runtime.systemstack_switch()
    /usr/local/go/src/runtime/asm_amd64.s:311 fp=0xc0000c91f8 sp=0xc0000c91f0 pc=0x459700
runtime.mallocgc(0x60000, 0x0, 0x3c000, 0xc6c3f74000)
    /usr/local/go/src/runtime/malloc.go:913 +0x896 fp=0xc0000c9298 sp=0xc0000c91f8 pc=0x40e2e6
runtime.growslice(0x1318ee0, 0xc6c3f74000, 0x9800, 0x9800, 0x9801, 0xc6c3f74000, 0x7800, 0x9800)
    /usr/local/go/src/runtime/slice.go:197 +0x219 fp=0xc0000c9300 sp=0xc0000c9298 pc=0x444519
github.com/cayleygraph/cayley/graph/kv.decodeIndex(0x7fcb35042040, 0x43f1f8, 0x43f1f8, 0xc1b8a96f30, 0x2, 0x2, 0xc1b8a96e70, 0x2)
    /cayley/graph/kv/indexing.go:753 +0x131 fp=0xc0000c9378 sp=0xc0000c9300 pc=0xaece71
github.com/cayleygraph/cayley/graph/kv.(*QuadStore).getBucketIndexes(0xc000273900, 0x176f900, 0xc000038048, 0x1777160, 0xc3c1e8e008, 0xc1b8a96c30, 0x2, 0x2, 0x410029, 0x132a520, ...)
    /cayley/graph/kv/indexing.go:720 +0x176 fp=0xc0000c9418 sp=0xc0000c9378 pc=0xaecab6
github.com/cayleygraph/cayley/graph/kv.(*QuadStore).hasPrimitive(0xc000273900, 0x176f900, 0xc000038048, 0x1777160, 0xc3c1e8e008, 0xc0000c9880, 0x968c42f20953f800, 0x837bace2, 0x0, 0x0)
    /cayley/graph/kv/indexing.go:860 +0x216 fp=0xc0000c94e8 sp=0xc0000c9418 pc=0xaedd36
github.com/cayleygraph/cayley/graph/kv.(*QuadStore).applyAddDeltas(0xc000273900, 0x1777160, 0xc3c1e8e008, 0x0, 0x0, 0x0, 0xc1c0dc01e0, 0x2330001, 0x1765240, 0xc4aa122a80, ...)
    /cayley/graph/kv/indexing.go:507 +0x94c fp=0xc0000c98e8 sp=0xc0000c94e8 pc=0xaea8ac
github.com/cayleygraph/cayley/graph/kv.(*quadWriter).WriteQuads(0xc0002753e0, 0xc004d26000, 0x186a0, 0x186a0, 0xc00000c0c0, 0xc004d26000, 0x186a0)
    /cayley/graph/kv/indexing.go:439 +0xa1 fp=0xc0000c9960 sp=0xc0000c98e8 pc=0xae9b11
github.com/cayleygraph/cayley/internal.(*batchLogger).WriteQuads(0xc00000c0e0, 0xc004d26000, 0x186a0, 0x186a0, 0x186a0, 0x0, 0x0)
    /cayley/internal/load.go:146 +0x59 fp=0xc0000c99c8 sp=0xc0000c9960 pc=0x75fe09
github.com/cayleygraph/cayley/quad.CopyBatch(0x175caa0, 0xc00000c0e0, 0x7fd080409068, 0xc00000c0c0, 0x186a0, 0xc00000c0c0, 0x0, 0x0)
    /cayley/quad/rw.go:145 +0x1dc fp=0xc0000c9a70 sp=0xc0000c99c8 pc=0x746d1c
github.com/cayleygraph/cayley/internal.DecompressAndLoad(0x1766f40, 0xc0002753e0, 0x186a0, 0x7ffff4cc0f78, 0x14, 0x0, 0x0, 0x0, 0x0)
    /cayley/internal/load.go:133 +0x17c fp=0xc0000c9ae8 sp=0xc0000c9a70 pc=0x75fc7c
github.com/cayleygraph/cayley/internal.Load(0x1766f40, 0xc0002753e0, 0x186a0, 0x7ffff4cc0f78, 0x14, 0x0, 0x0, 0x15266d9, 0xa)
    /cayley/internal/load.go:21 +0x71 fp=0xc0000c9b40 sp=0xc0000c9ae8 pc=0x75ec31
github.com/cayleygraph/cayley/cmd/cayley/command.openForQueries(0xc000270c80, 0x0, 0x0, 0x0)
    /cayley/cmd/cayley/command/database.go:259 +0x2bb fp=0xc0000c9c48 sp=0xc0000c9b40 pc=0x8e908b
github.com/cayleygraph/cayley/cmd/cayley/command.NewHttpCmd.func1(0xc000270c80, 0xc0000eefd0, 0x0, 0xb, 0x0, 0x0)
    /cayley/cmd/cayley/command/http.go:24 +0x9f fp=0xc0000c9ce8 sp=0xc0000c9c48 pc=0x8ee9af
github.com/spf13/cobra.(*Command).execute(0xc000270c80, 0xc0000eef20, 0xb, 0xb, 0xc000270c80, 0xc0000eef20)
    /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:826 +0x473 fp=0xc0000c9dd8 sp=0xc0000c9ce8 pc=0x5c0fa3
github.com/spf13/cobra.(*Command).ExecuteC(0x2183b40, 0x2183dc0, 0xc000231b80, 0xc000187f88)
    /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x2f8 fp=0xc0000c9f20 sp=0xc0000c9dd8 pc=0x5c1998
github.com/spf13/cobra.(*Command).Execute(0x2183b40, 0xc000270a00, 0xc000270c80)
    /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864 +0x2b fp=0xc0000c9f50 sp=0xc0000c9f20 pc=0x5c167b
main.main()
    /cayley/cmd/cayley/cayley.go:187 +0x31 fp=0xc0000c9f98 sp=0xc0000c9f50 pc=0x10e9381
runtime.main()
    /usr/local/go/src/runtime/proc.go:201 +0x207 fp=0xc0000c9fe0 sp=0xc0000c9f98 pc=0x42f197
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc0000c9fe8 sp=0xc0000c9fe0 pc=0x45b7e1

goroutine 5 [syscall, 834 minutes]:
os/signal.signal_recv(0x0)
    /usr/local/go/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
    /usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
    /usr/local/go/src/os/signal/signal_unix.go:29 +0x41

goroutine 6 [chan receive]:
github.com/golang/glog.(*loggingT).flushDaemon(0x23162c0)
    /go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:882 +0x8b
created by github.com/golang/glog.init.0
    /go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:410 +0x203

image

dennwc commented 5 years ago

Thanks for taking the time to test the new version!

badger died after 20min and 14M quads (batch size 1k) with Error: db: failed to load data: Txn is too big to fit into one request

I think we may give up for now on Badger. I'll need to change the way how we import data into it to avoid those large transaction.

Bolt failed after 13.5h and 207M quads

We are getting somewhere. But it seems like the SP index may also build up over time. I wonder what in the Freebase schema may cause it? I guess we will find out after a successful import :)

In any case, I will get back to you after making a few more changes.

First, I want to add instrumentation, so me can get more insights from the import process. I will add a "time per batch" metric, so we can get the same graph as you've built from Prometheus/Grafana directly. Also, I'm particularly interested in seeing the sizes of those index entries on each index. Also, things like lookup durations may be useful to speed up the import further.

And second, I will add an option to set custom indexes for KV, so we can try "SPO" index instead of "SP". Since the import was able to proceed further than before, I think the "O" index was indeed the case for previous OOM.

dennwc commented 5 years ago

@eawer The patch with metrics and improved write performance was pushed upstream. It now optimizes the first batch write (e.g. cayley load) to minimize reads from the disk.

However, as I mentioned before, this improves performance only to some extent. I'm now working on a two-stage import tool that should improve write speed significantly. It will first use Badger to build a node index, and then export it to the target backend (e.g. Bolt).

From the first prototype, metrics show that the import process speed is constant and doesn't degrade over time, thanks to Badger's write optimizations. Building nodes index for the whole dataset takes < 10h. I will post precise test results later.

manishrjain commented 5 years ago

Just happened upon this issue. Have you guys considered using WriteBatch for Badger? It can deal with txn sizes and automatically commit them as needed.

dennwc commented 5 years ago

@manishrjain Thanks for a suggestion, we will definitely consider that API :)

The problem is not with Badger specifically, as I mentioned. It's a problem with Cayley's legacy write API which tries to be suitable both for batch uploads and regular transactions. The new importer can take advantage of a WriteBatch in Badger directly and it work really well so far!

hubyhuby commented 4 years ago

For info on my system

5 X the size of the original data seems a little big.

dennwc commented 4 years ago

@hubyhuby Note that Bolt pre-allocates some space, so not all of it is actually used by the database.

mazzespazze commented 4 years ago

Just a follow up, but does now work to import freebase on Cayley?