apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.16k stars 3.57k forks source link

BookKeeper Crashes unexpectedly #258

Closed estebangarcia closed 7 years ago

estebangarcia commented 7 years ago

Hi. We had a couple of bookkeepers that crashed unexpectedly at different moments. We gathered the logs before the crash, I'm attaching them. Any help will be much appreciated.

Thanks bklogs.txt

merlimat commented 7 years ago

It looks like there are failures related to the DataSketches stats library :

java.lang.ArrayIndexOutOfBoundsException: 256
        at com.yahoo.sketches.quantiles.DoublesUpdateImpl.zipSize2KBuffer(DoublesUpdateImpl.java:127)
        at com.yahoo.sketches.quantiles.DoublesUpdateImpl.inPlacePropagateCarry(DoublesUpdateImpl.java:92)
        at com.yahoo.sketches.quantiles.DoublesUpdateImpl.processFullBaseBuffer(DoublesUpdateImpl.java:46)
        at com.yahoo.sketches.quantiles.HeapDoublesSketch.update(HeapDoublesSketch.java:176)
        at org.apache.bokkeeper.stats.datasketches.DataSketchesOpStatsLogger.registerSuccessfulEvent(DataSketchesOpStatsLogger.java:59)
        at org.apache.bookkeeper.bookie.Journal.run(Journal.java:895)

This exception is happening in the Journal thread and causes the bookie process to restart.

Other exception during the stats collection :

2017-02-26 17:59:56,096 - WARN  - [metrics-1-1:DataSketchesMetricsProvider@76] - Failed to report stats: 128
java.lang.ArrayIndexOutOfBoundsException: 128
        at com.yahoo.sketches.quantiles.DoublesAuxiliary.populateFromQuantilesSketch(DoublesAuxiliary.java:99)
        at com.yahoo.sketches.quantiles.DoublesAuxiliary.<init>(DoublesAuxiliary.java:38)
        at com.yahoo.sketches.quantiles.DoublesSketch.constructAuxiliary(DoublesSketch.java:607)
        at com.yahoo.sketches.quantiles.DoublesSketch.getQuantile(DoublesSketch.java:195)
        at org.apache.bokkeeper.stats.datasketches.DataSketchesOpStatsLogger.getMedian(DataSketchesOpStatsLogger.java:121)
        at org.apache.bokkeeper.stats.datasketches.JsonFileReporter.lambda$report$9(JsonFileReporter.java:67)
        at java.util.concurrent.ConcurrentSkipListMap.forEach(ConcurrentSkipListMap.java:3252)
        at org.apache.bokkeeper.stats.datasketches.JsonFileReporter.report(JsonFileReporter.java:61)
        at org.apache.bokkeeper.stats.datasketches.DataSketchesMetricsProvider.lambda$null$5(DataSketchesMetricsProvider.java:74)

As a workaround, you can fallback a different stats implementation for the bookies, eg:

statsProviderClass=org.apache.bookkeeper.stats.CodahaleMetricsProvider
codahaleStatsJmxEndpoint=metrics

and collect the stats through JMX. Or comment statsProviderClass to disable stats

merlimat commented 7 years ago

I don't understand the specific exceptions. Need to dig a bit into that code or ask help to DataSketches people.

estebangarcia commented 7 years ago

Thanks for your response. We'll disable the stats for now.

merlimat commented 7 years ago

There seems to be a concurrency issue in the stats provider that makes 2 threads seeing inconsistent internal state and thus throw exception. I'll send a fix in the bookkeeper repo branch.