Closed jarifibrahim closed 4 years ago
When running flock for several days with v20.07.0-beta.Jun15
, I ran into this on alpha1:
2020/06/16 00:54:49 Assert failed
github.com/dgraph-io/badger/v2/y.AssertTrue
/home/ubuntu/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/y/error.go:55
github.com/dgraph-io/badger/v2/table.(*block).decrRef
/home/ubuntu/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/table/table.go:244
github.com/dgraph-io/badger/v2/table.BlockEvictHandler
/home/ubuntu/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/table/table.go:190
github.com/dgraph-io/badger/v2.Open.func4
/home/ubuntu/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200528205344-e7b6e76f96e8/db.go:303
github.com/dgraph-io/ristretto.(*Cache).processItems
/home/ubuntu/go/pkg/mod/github.com/dgraph-io/ristretto@v0.0.2/cache.go:315
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1373
@darkn3rd do you know how long flock was running for?
@jarifibrahim I'm unsure about the last run, I think for a few hours. Recently I tried to run with the existing data directories, but upgraded server, and I ran into the problem. It took 20 minutes to reproduce.
[Decoder]: Using assembly version of decoder
[Sentry] 2020/06/25 22:05:07 Integration installed: ContextifyFrames
[Sentry] 2020/06/25 22:05:07 Integration installed: Environment
[Sentry] 2020/06/25 22:05:07 Integration installed: Modules
[Sentry] 2020/06/25 22:05:07 Integration installed: IgnoreErrors
[Decoder]: Using assembly version of decoder
[Sentry] 2020/06/25 22:05:08 Integration installed: ContextifyFrames
[Sentry] 2020/06/25 22:05:08 Integration installed: Environment
[Sentry] 2020/06/25 22:05:08 Integration installed: Modules
[Sentry] 2020/06/25 22:05:08 Integration installed: IgnoreErrors
I0625 22:05:08.465587 15 sentry_integration.go:48] This instance of Dgraph will send anonymous reports of panics back to Dgraph Labs via Sentry. No confidential information is sent. These reports help improve Dgraph. To opt-out, restart your instance with the --enable_sentry=false flag. For more info, see https://dgraph.io/docs/howto/#data-handling.
I0625 22:05:08.466843 15 util_ee.go:126] KeyReader instantiated of type <nil>
I0625 22:05:08.643733 15 init.go:99]
Dgraph version : v20.07.0-beta.Jun22-7-g73a9ddd80
Dgraph SHA-256 : e685d19897fd1b9784a14f14dc0d805f8011d2dbc92ca444ad4a2b7adca6523e
Commit SHA-1 : 73a9ddd80
Commit timestamp : 2020-06-25 13:24:40 -0700
Branch : release/v20.07
Go version : go1.14.2
For Dgraph official documentation, visit https://dgraph.io/docs/.
For discussions about Dgraph , visit https://discuss.dgraph.io.
To say hi to the community , visit https://dgraph.slack.com.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.
I0625 22:05:08.643771 15 run.go:641] x.Config: {PortOffset:0 QueryEdgeLimit:1000000 NormalizeNodeLimit:10000 PollInterval:1s}
I0625 22:05:08.643797 15 run.go:642] x.WorkerConfig: {ExportPath:export NumPendingProposals:256 Tracing:0.01 MyAddr:alpha1:7080 ZeroAddr:[zero1:5080] RaftId:0 WhiteListedIPRanges:[] MaxRetries:-1 StrictMutations:false AclEnabled:false AbortOlderThan:5m0s SnapshotAfter:10000 ProposedGroupId:0 StartTime:2020-06-25 22:05:08.104203835 +0000 UTC m=+0.014817380 LudicrousMode:false EncryptionKey:**** LogRequest:0}
I0625 22:05:08.643831 15 run.go:643] worker.Config: {PostingDir:p BadgerTables:mmap BadgerVlog:mmap BadgerCompressionLevel:3 WALDir:w MutationsMode:0 AuthToken: AllottedMemory:1024 HmacSecret:**** AccessJwtTtl:0s RefreshJwtTtl:0s AclRefreshInterval:0s}
I0625 22:05:08.643898 15 server_state.go:80] Setting Badger Compression Level: 3
I0625 22:05:08.643903 15 server_state.go:89] Setting Badger table load option: mmap
I0625 22:05:08.643907 15 server_state.go:101] Setting Badger value log load option: mmap
I0625 22:05:08.643917 15 server_state.go:146] Opening write-ahead log BadgerDB with options: {Dir:w ValueDir:w SyncWrites:false TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 ReadOnly:false Truncate:true Logger:0x2b78fd0 Compression:2 InMemory:false MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 KeepL0InMemory:false MaxCacheSize:10485760 MaxBfCacheSize:0 LoadBloomsOnOpen:false NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:2 CompactL0OnClose:true LogRotatesToFlush:2 ZSTDCompressionLevel:3 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:false KeepBlockIndicesInCache:false KeepBlocksInCache:false managedTxns:false maxBatchCount:0 maxBatchSize:0}
I0625 22:05:08.655763 15 log.go:34] All 2 tables opened in 5ms
I0625 22:05:08.658771 15 log.go:34] Replaying file id: 4501 at offset: 4587032
I0625 22:05:08.658851 15 log.go:34] Replay took: 56.757µs
I0625 22:05:08.658888 15 log.go:34] Replaying file id: 4502 at offset: 0
I0625 22:05:08.682758 15 log.go:34] Replay took: 23.860349ms
I0625 22:05:08.682852 15 log.go:34] Replaying file id: 4503 at offset: 0
I0625 22:05:08.685928 15 log.go:34] Replay took: 3.065584ms
I0625 22:05:08.686517 15 server_state.go:80] Setting Badger Compression Level: 3
I0625 22:05:08.686536 15 server_state.go:89] Setting Badger table load option: mmap
I0625 22:05:08.686542 15 server_state.go:101] Setting Badger value log load option: mmap
I0625 22:05:08.686550 15 server_state.go:170] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:false TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 ReadOnly:false Truncate:true Logger:0x2b78fd0 Compression:2 InMemory:false MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 KeepL0InMemory:false MaxCacheSize:1073741824 MaxBfCacheSize:524288000 LoadBloomsOnOpen:false NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:2 CompactL0OnClose:true LogRotatesToFlush:2 ZSTDCompressionLevel:3 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:false KeepBlockIndicesInCache:true KeepBlocksInCache:true managedTxns:false maxBatchCount:0 maxBatchSize:0}
I0625 22:05:10.408685 15 log.go:34] All 297 tables opened in 1.664s
I0625 22:05:10.462026 15 log.go:34] Replaying file id: 442 at offset: 50684148
I0625 22:05:10.462060 15 log.go:34] Replay took: 4.023µs
I0625 22:05:10.463123 15 groups.go:107] Current Raft Id: 0x1
I0625 22:05:10.463419 15 worker.go:97] Worker listening at address: [::]:7080
I0625 22:05:10.464362 15 run.go:513] Bringing up GraphQL HTTP API at 0.0.0.0:8080/graphql
I0625 22:05:10.464388 15 run.go:514] Bringing up GraphQL HTTP admin API at 0.0.0.0:8080/admin
I0625 22:05:10.464428 15 run.go:546] gRPC server started. Listening on port 9080
I0625 22:05:10.464437 15 run.go:547] HTTP server started. Listening on port 8080
I0625 22:05:10.563744 15 pool.go:160] CONNECTING to zero1:5080
I0625 22:05:14.568693 15 groups.go:135] Connected to group zero. Assigned group: 0
I0625 22:05:14.568736 15 groups.go:137] Raft Id after connection to Zero: 0x1
I0625 22:05:14.568953 15 draft.go:210] Node ID: 0x1 with GroupID: 1
I0625 22:05:14.569006 15 node.go:148] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0xc0825f1c80 Applied:16426094 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x2b78fd0 DisableProposalForwarding:false}
I0625 22:05:14.569018 15 pool.go:160] CONNECTING to alpha1:7080
W0625 22:05:14.571027 15 pool.go:254] Connection lost with alpha1:7080. Error: rpc error: code = Unknown desc = No node has been set up yet
I0625 22:05:14.571758 15 node.go:306] Found Snapshot.Metadata: {ConfState:{Nodes:[1] Learners:[] XXX_unrecognized:[]} Index:16426094 Term:3 XXX_unrecognized:[]}
I0625 22:05:14.571788 15 node.go:317] Found hardstate: {Term:4 Vote:1 Commit:16427462 XXX_unrecognized:[]}
I0625 22:05:14.580827 15 node.go:326] Group 1 found 1368 entries
I0625 22:05:14.580854 15 draft.go:1595] Restarting node for group: 1
I0625 22:05:14.580880 15 node.go:185] Setting conf state to nodes:1
I0625 22:05:14.581430 15 log.go:34] 1 became follower at term 4
I0625 22:05:14.581949 15 log.go:34] newRaft 1 [peers: [1], term: 4, commit: 16427462, applied: 16426094, lastindex: 16427462, lastterm: 4]
I0625 22:05:14.581983 15 draft.go:160] Operation started with id: opRollup
I0625 22:05:14.581995 15 groups.go:155] Server is ready
I0625 22:05:14.582092 15 groups.go:790] Got address of a Zero leader: zero1:5080
I0625 22:05:14.582178 15 draft.go:1001] Found Raft progress: 16427462
I0625 22:05:14.582318 15 groups.go:804] Starting a new membership stream receive from zero1:5080.
I0625 22:05:14.582906 15 groups.go:821] Received first state update from Zero: counter:10916397 groups:<key:1 value:<members:<key:1 value:<id:1 group_id:1 addr:"alpha1:7080" leader:true last_update:1592864399 > > tablets:<key:"author" value:<group_id:1 predicate:"author" space:16866422277 > > tablets:<key:"created_at" value:<group_id:1 predicate:"created_at" space:324077075750 > > tablets:<key:"description" value:<group_id:1 predicate:"description" space:102979608756 > > tablets:<key:"dgraph.graphql.schema" value:<group_id:1 predicate:"dgraph.graphql.schema" > > tablets:<key:"dgraph.graphql.xid" value:<group_id:1 predicate:"dgraph.graphql.xid" > > tablets:<key:"dgraph.type" value:<group_id:1 predicate:"dgraph.type" space:1827482254 > > tablets:<key:"followers_count" value:<group_id:1 predicate:"followers_count" space:411606143 > > tablets:<key:"friends_count" value:<group_id:1 predicate:"friends_count" space:411614913 > > tablets:<key:"hashtags" value:<group_id:1 predicate:"hashtags" space:2479493467 > > tablets:<key:"id_str" value:<group_id:1 predicate:"id_str" space:4374722908 > > tablets:<key:"mention" value:<group_id:1 predicate:"mention" space:16407742505 > > tablets:<key:"message" value:<group_id:1 predicate:"message" space:11805860031 > > tablets:<key:"profile_banner_url" value:<group_id:1 predicate:"profile_banner_url" space:135114300150 > > tablets:<key:"profile_image_url" value:<group_id:1 predicate:"profile_image_url" space:1102085209 > > tablets:<key:"retweet" value:<group_id:1 predicate:"retweet" space:3307481423 > > tablets:<key:"screen_name" value:<group_id:1 predicate:"screen_name" space:1391134659 > > tablets:<key:"urls" value:<group_id:1 predicate:"urls" > > tablets:<key:"user_id" value:<group_id:1 predicate:"user_id" space:2286473313 > > tablets:<key:"user_name" value:<group_id:1 predicate:"user_name" space:5515738130 > > tablets:<key:"verified" value:<group_id:1 predicate:"verified" > > snapshot_ts:21810505 checksum:17475820780574559673 > > zeros:<key:1 value:<id:1 addr:"zero1:5080" leader:true > > maxLeaseId:19540000 maxTxnTs:21820000 maxRaftId:1 cid:"fabd9a37-66ab-408b-88f3-ec1c52ea61d2" license:<maxNodes:18446744073709551615 expiryTs:1595185409 enabled:true >
I0625 22:05:17.682737 15 log.go:34] 1 is starting a new election at term 4
I0625 22:05:17.682782 15 log.go:34] 1 became pre-candidate at term 4
I0625 22:05:17.682788 15 log.go:34] 1 received MsgPreVoteResp from 1 at term 4
I0625 22:05:17.683104 15 log.go:34] 1 became candidate at term 5
I0625 22:05:17.683119 15 log.go:34] 1 received MsgVoteResp from 1 at term 5
I0625 22:05:17.683745 15 log.go:34] 1 became leader at term 5
I0625 22:05:17.683767 15 log.go:34] raft.node: 1 elected leader 1 at term 5
I0625 22:05:18.582372 15 groups.go:873] Leader idx=0x1 of group=1 is connecting to Zero for txn updates
I0625 22:05:18.582425 15 groups.go:882] Got Zero leader: zero1:5080
I0625 22:05:18.731910 15 admin.go:661] No GraphQL schema in Dgraph; serving empty GraphQL API
I0625 22:11:47.818459 15 server.go:137] Received ALTER op: schema:"\n\t\ttype Tweet {\n\t\t\tid_str: string\n\t\t\tcreated_at: dateTime\n\t\t\tmessage: string\n\t\t\turls: [string]\n\t\t\thashtags: [string]\n\t\t\tauthor: User\n\t\t\tmention: [User]\n\t\t\tretweet: bool\n\t\t}\n\t\t\n\t\ttype User {\n\t\t\tuser_id\n\t\t\tuser_name\n\t\t\tscreen_name\n\t\t\tdescription\n\t\t\tfriends_count\n\t\t\tfollowers_count\n\t\t\tverified\n\t\t\tprofile_banner_url\n\t\t\tprofile_image_url\n\t\t}\n\n\t\tuser_id: string @index(exact) @upsert .\n\t\tuser_name: string @index(hash) .\n\t\tscreen_name: string @index(term) .\n\t\tdescription: string .\n\t\tfriends_count: int .\n\t\tfollowers_count: int .\n\t\tverified: bool .\n\t\tprofile_banner_url: string .\n\t\tprofile_image_url: string .\n\t\tid_str: string @index(exact) @upsert .\n\t\tcreated_at: dateTime @index(hour) .\n\t\tmessage: string .\n\t\turls: [string] .\n\t\thashtags: [string] @index(exact) .\n\t\tauthor: uid @count @reverse .\n\t\tmention: [uid] @reverse .\n\t\tretweet: bool .\n\t"
I0625 22:11:47.818505 15 server.go:1161] Got Alter request from "172.18.0.1:56406"
W0625 22:11:47.819350 15 parse.go:405] Type declaration for type Tweet includes deprecated information about field type for field id_str which will be ignored.
W0625 22:11:47.819376 15 parse.go:405] Type declaration for type Tweet includes deprecated information about field type for field created_at which will be ignored.
W0625 22:11:47.819390 15 parse.go:405] Type declaration for type Tweet includes deprecated information about field type for field message which will be ignored.
W0625 22:11:47.819398 15 parse.go:405] Type declaration for type Tweet includes deprecated information about field type for field urls which will be ignored.
W0625 22:11:47.819406 15 parse.go:405] Type declaration for type Tweet includes deprecated information about field type for field hashtags which will be ignored.
W0625 22:11:47.819413 15 parse.go:405] Type declaration for type Tweet includes deprecated information about field type for field author which will be ignored.
W0625 22:11:47.819420 15 parse.go:405] Type declaration for type Tweet includes deprecated information about field type for field mention which will be ignored.
W0625 22:11:47.819428 15 parse.go:405] Type declaration for type Tweet includes deprecated information about field type for field retweet which will be ignored.
I0625 22:11:47.819472 15 server.go:299] Got schema: &{Preds:[predicate:"user_id" value_type:STRING directive:INDEX tokenizer:"exact" upsert:true predicate:"user_name" value_type:STRING directive:INDEX tokenizer:"hash" predicate:"screen_name" value_type:STRING directive:INDEX tokenizer:"term" predicate:"description" value_type:STRING predicate:"friends_count" value_type:INT predicate:"followers_count" value_type:INT predicate:"verified" value_type:BOOL predicate:"profile_banner_url" value_type:STRING predicate:"profile_image_url" value_type:STRING predicate:"id_str" value_type:STRING directive:INDEX tokenizer:"exact" upsert:true predicate:"created_at" value_type:DATETIME directive:INDEX tokenizer:"hour" predicate:"message" value_type:STRING predicate:"urls" value_type:STRING list:true predicate:"hashtags" value_type:STRING directive:INDEX tokenizer:"exact" list:true predicate:"author" value_type:UID directive:REVERSE count:true predicate:"mention" value_type:UID directive:REVERSE list:true predicate:"retweet" value_type:BOOL ] Types:[type_name:"Tweet" fields:<predicate:"id_str" > fields:<predicate:"created_at" > fields:<predicate:"message" > fields:<predicate:"urls" > fields:<predicate:"hashtags" > fields:<predicate:"author" > fields:<predicate:"mention" > fields:<predicate:"retweet" > type_name:"User" fields:<predicate:"user_id" > fields:<predicate:"user_name" > fields:<predicate:"screen_name" > fields:<predicate:"description" > fields:<predicate:"friends_count" > fields:<predicate:"followers_count" > fields:<predicate:"verified" > fields:<predicate:"profile_banner_url" > fields:<predicate:"profile_image_url" > ]}
I0625 22:11:47.824165 15 draft.go:160] Operation started with id: opIndexing
I0625 22:11:47.824375 15 draft.go:115] Operation completed with id: opRollup
I0625 22:11:47.824842 15 draft.go:115] Operation completed with id: opIndexing
I0625 22:11:47.824936 15 server.go:318] ALTER op: schema:"\n\t\ttype Tweet {\n\t\t\tid_str: string\n\t\t\tcreated_at: dateTime\n\t\t\tmessage: string\n\t\t\turls: [string]\n\t\t\thashtags: [string]\n\t\t\tauthor: User\n\t\t\tmention: [User]\n\t\t\tretweet: bool\n\t\t}\n\t\t\n\t\ttype User {\n\t\t\tuser_id\n\t\t\tuser_name\n\t\t\tscreen_name\n\t\t\tdescription\n\t\t\tfriends_count\n\t\t\tfollowers_count\n\t\t\tverified\n\t\t\tprofile_banner_url\n\t\t\tprofile_image_url\n\t\t}\n\n\t\tuser_id: string @index(exact) @upsert .\n\t\tuser_name: string @index(hash) .\n\t\tscreen_name: string @index(term) .\n\t\tdescription: string .\n\t\tfriends_count: int .\n\t\tfollowers_count: int .\n\t\tverified: bool .\n\t\tprofile_banner_url: string .\n\t\tprofile_image_url: string .\n\t\tid_str: string @index(exact) @upsert .\n\t\tcreated_at: dateTime @index(hour) .\n\t\tmessage: string .\n\t\turls: [string] .\n\t\thashtags: [string] @index(exact) .\n\t\tauthor: uid @count @reverse .\n\t\tmention: [uid] @reverse .\n\t\tretweet: bool .\n\t" done
I0625 22:11:57.825132 15 draft.go:160] Operation started with id: opRollup
W0625 22:13:48.497932 15 draft.go:1222] Raft.Ready took too long to process: Timer Total: 216ms. Breakdown: [{sync 216ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0625 22:13:50.144252 15 draft.go:1222] Raft.Ready took too long to process: Timer Total: 230ms. Breakdown: [{sync 230ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
I0625 22:14:15.184530 15 draft.go:567] Creating snapshot at index: 16438229. ReadTs: 21834600.
2020/06/25 22:16:03 Assert failed
github.com/dgraph-io/badger/v2/y.AssertTrue
/home/ubuntu/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200615081930-c45d966681d4/y/error.go:55
github.com/dgraph-io/badger/v2/table.(*block).decrRef
/home/ubuntu/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200615081930-c45d966681d4/table/table.go:244
github.com/dgraph-io/badger/v2/table.BlockEvictHandler
/home/ubuntu/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200615081930-c45d966681d4/table/table.go:190
github.com/dgraph-io/badger/v2.Open.func5
/home/ubuntu/go/pkg/mod/github.com/dgraph-io/badger/v2@v2.0.1-rc1.0.20200615081930-c45d966681d4/db.go:310
github.com/dgraph-io/ristretto.(*Cache).processItems
/home/ubuntu/go/pkg/mod/github.com/dgraph-io/ristretto@v0.0.2/cache.go:315
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1373
The crash originates from cache eviction handler.
Maybe the iterator is closing the same block twice somehow. Once in the setBlock
https://github.com/dgraph-io/badger/blob/158d9271e1052082d6377ffa56483650129b726c/table/iterator.go#L43-L46
and the second time in the itr.Close
https://github.com/dgraph-io/badger/blob/158d9271e1052082d6377ffa56483650129b726c/table/iterator.go#L111-L113
I don't know how that could happen but that's the only way b.decrRef()
can be called more than b.incrRef()
and so the crash.
Looking at the code, I think there are at least two issues:
Issue #1: It seems to me that ristretto doesn't currently guarantee that the block has been added to the cache if and only if Set()
returns true.
More specifically, in the case of updates, the item is first updated in the cache, then it is put on the setBuf
channel. If that last operation failed (because under load the channel might get is full), Set()
will return false, but the item has been put on the cache.
From there, Table.block()
will decrease the reference count of the block, and return it with a refcount of 1, despite the block actually being in the cache and up for grab by other callers:
The scenario where it can happen is a race between two goroutines loading the same block that is not yet in the cache:
setBuf
channel, returns false, the block has been put in the cache, but it is returned with a refcount of 1Issue #2: It seems like the current code assumes that the evict handler is always called when an item is removed from the cache. But that does not seem to be true when items are updated. The evict handler will not be called on the removed version. As a consequence, blocks will leak when two goroutines race, like above, to load an item that is not in the cache. Luckily, the only effect of this is more memory garbage.
because under load the channel might get is full
I wondered if that was plausible or not, and it is:
(*Table).DecrRef()
issues a cache deletion for all the block cache keys of the table in a tight loop:(*levelHandler).replaceTables()
issues a (*Table).DecrRef()
in a tight loop for all the deleted tables after a compaction:At the default table (64 MB) and block size (4 kB), there are 16384 blocks per table. At the default level multiplier, you expect a typical compaction to delete 11 tables (10 at the bottom layer, one at the top layer), so each compaction generates 180224 cache deletion requests in a relatively tight loop (but only relatively, there are a bunch of syscalls in (*Table).DecrRef()
). In addition, there can be multiple compactions cleaning up tables at the same time.
The setBuf channel in ristretto has 32768 entries, so it sounds highly likely to me a real-life application could end up filling the set buffer, triggering the issue.
This is fixed in ristretto via https://github.com/dgraph-io/ristretto/issues/167 and ristretto was updated in badger via https://github.com/dgraph-io/badger/pull/1391 .
Thank you for the help @damz !
What version of Go are you using (
go version
)?go 1.13
What version of Badger are you using?
6eaa5009af27a483f7be3538fee77d2e894213a2
Steps to Reproduce the issue
// opt.ZSTDCompressionLevel = Config.BadgerCompressionLevel }
glog.Infof("Setting Badger table load option: %s", Config.BadgerTables)
What did you do?
Run queries and mutation on dgraph.
What did you expect to see?
I expected no crash.
What did you see instead?