Closed drax68 closed 8 years ago
What do the index logs show? There should be logs that look like:
2016-05-31T08:16:00.259 INFO [MemoryDatabaseRebuildIndex] com.netflix.atlas.core.db.MemoryDatabase: rebuilding metadata index
2016-05-31T08:16:00.260 INFO [MemoryDatabaseRebuildIndex] com.netflix.atlas.core.db.MemoryDatabase: done rebuilding metadata index, 2 metrics
These will tell how long it is taking to index the content and the number of distinct time series in the index.
A couple of other things to note:
-XX:StringTableSize=1000003
.jmap -histo:live <pid> | head -n50
would help.Checked logs - rebuild takes up to 60 seconds for 1340674 metrics. Tried c1 - generally looks better, but after 25 hours ends up with hours of delay and most of the heap utilized.
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 115813122048 (110448.0MB)
NewSize = 1363144 (1.2999954223632812MB)
MaxNewSize = 69474451456 (66256.0MB)
OldSize = 5452592 (5.1999969482421875MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 16777216 (16.0MB)
Heap Usage:
G1 Heap:
regions = 6903
capacity = 115813122048 (110448.0MB)
used = 81499614872 (77724.08950042725MB)
free = 34313507176 (32723.910499572754MB)
70.37165860896282% used
G1 Young Generation:
Eden Space:
regions = 82
capacity = 4177526784 (3984.0MB)
used = 1375731712 (1312.0MB)
free = 2801795072 (2672.0MB)
32.93172690763052% used
Survivor Space:
regions = 21
capacity = 352321536 (336.0MB)
used = 352321536 (336.0MB)
free = 0 (0.0MB)
100.0% used
G1 Old Generation:
regions = 4762
capacity = 82023809024 (78224.0MB)
used = 79771561624 (76076.08950042725MB)
free = 2252247400 (2147.910499572754MB)
97.25415409647582% used
154753 interned Strings occupying 35269128 bytes.
jmap -histo:live - not able to connect to service, trying with jmap -histo -F
Object Histogram:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 13735175 18898908672 double[]
2: 21180760 677784320 java.util.concurrent.LinkedBlockingQueue$Node
3: 1735448 611425608 java.lang.Object[]
4: 14783706 591348240 com.netflix.atlas.core.model.ConstantBlock
5: 5146349 591195064 char[]
6: 7649875 367194000 com.netflix.atlas.core.model.SparseBlock
7: 1340683 300312992 com.netflix.atlas.core.model.Block[]
8: 6085267 243410680 com.netflix.atlas.core.model.ArrayBlock
9: 141281 218798360 short[]
10: 454 189877912 long[]
11: 3950949 189645552 java.util.HashMap$Node
12: 4728340 151306880 java.lang.String
13: 3100611 148829328 com.netflix.spectator.api.TagList
14: 2999382 143970336 java.lang.ref.WeakReference
15: 2999184 143960832 com.netflix.spectator.api.ObjectGauge
16: 2783140 133590720 java.math.BigInteger
17: 2072099 132614336 java.nio.StringCharBuffer
18: 3220653 103060896 java.util.concurrent.ConcurrentLinkedQueue$Node
19: 3099824 99194368 com.netflix.spectator.api.DefaultId
20: 2478879 99155160 spray.routing.RequestContext
21: 1365951 87420864 java.util.LinkedHashMap$Entry
22: 1340683 85803712 com.netflix.atlas.core.db.MemoryBlockStore
23: 69 80330528 com.netflix.atlas.core.db.BlockStoreItem[]
24: 1577770 75732960 java.util.concurrent.ConcurrentHashMap$Node
25: 1272530 71261680 com.netflix.atlas.core.norm.NormalizeValueFunction
26: 1441759 69204432 com.netflix.atlas.core.db.BlockStoreItem
27: 71 67998608 scala.collection.immutable.Map[]
28: 752050 63781168 java.lang.String[]
29: 1976546 63249472 scala.Tuple2
30: 1381050 55242000 com.netflix.atlas.core.model.BasicTaggedItem
31: 1115383 53538384 spray.routing.RequestContext$$anon$3
32: 1114800 53510400 spray.routing.RequestContext$$anon$1
33: 1329860 53194400 com.netflix.atlas.core.norm.NormalizationCache$CacheValue
34: 2048757 49170168 scala.runtime.ObjectRef
35: 1382814 44250048 com.netflix.atlas.core.util.SmallHashMap
36: 1329860 42555520 com.netflix.atlas.core.norm.UpdateValueFunction
37: 99 41781072 java.math.BigInteger[]
38: 416883 30015576 gnu.trove.set.hash.TIntHashSet
39: 8711 28137480 java.util.HashMap$Node[]
40: 123921 27758304 java.text.DecimalFormat
41: 691954 27678160 com.netflix.atlas.core.index.SortedArraySet
42: 688248 27529920 com.netflix.atlas.core.model.Tag
@brharrington, anything else to check?
Somehow I missed the previous update. I'll take a look this weekend.
@brharrington, any update on this issue? We still suffer from it on a daily basis.
Anything new regarding that issue?
Issue fixed after multiple memory-related commits. Thank you, @brharrington.
Hi, we noticed interesting atlas behaviour. After some period of time we observe increasing delay for new metrics, see attached ss
Also heap consumption never stops growing. Current configuration (atlas 1.5.0-rc5)
Jvm options:
What can we check to debug that issue?