datastax / pulsar-helm-chart

Apache Pulsar Helm chart
Apache License 2.0
46 stars 38 forks source link

Rename "GC Pauses" to "GC Time" in Pulsar JVM dashboards #252

Closed lhotari closed 1 year ago

lhotari commented 1 year ago
dave2wave commented 1 year ago

"GC Collection Time" is redundant. It should be "GC Time".

lhotari commented 1 year ago

"GC Collection Time" is redundant. It should be "GC Time".

@dave2wave thanks for pointing that out. fixed.

lhotari commented 1 year ago

There's also a metric for GC Pauses in Pulsar. The metric name is jvm_full_gc_pause. Here's the implementation: https://github.com/apache/pulsar/blob/52d8fe03160648f14dc196edaa87921064a1b2b2/pulsar-common/src/main/java/org/apache/pulsar/common/stats/JvmDefaultGCMetricsLogger.java#L93-L94

The implementation does raise some questions since it's not exposed directly from the source metric. Instead, the logic is attempting to calculate the duration of the most recent GC pause. There might be issues with such conversion. The source metric HotspotRuntimeMBean.getTotalSafepointTime is described in https://stackoverflow.com/a/42515743

michaeljmarshall commented 1 year ago

Thanks @lhotari. Is there a way to get an accurate GC pause metric?

lhotari commented 1 year ago

Thanks @lhotari. Is there a way to get an accurate GC pause metric?

Brokers already have a separate metric for this, jvm_full_gc_pause . https://github.com/apache/pulsar/blob/4dcb166e0bfcce7fc85fd8d59a25b881f6f9c6fa/pulsar-common/src/main/java/org/apache/pulsar/common/stats/JvmDefaultGCMetricsLogger.java#L93

It uses total safepoint time metric under the covers, https://github.com/apache/pulsar/blob/4dcb166e0bfcce7fc85fd8d59a25b881f6f9c6fa/pulsar-common/src/main/java/org/apache/pulsar/common/stats/JvmDefaultGCMetricsLogger.java#L124

It's a bit misleading to call it "full gc" since it's more than full gc. More details in https://blanco.io/blog/jvm-safepoint-pauses/