apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.3k stars 3.59k forks source link

The bookkeeper ledger no honoring the retention period and causing for the more disk usage. #6639

Open prasad-reddyk opened 4 years ago

prasad-reddyk commented 4 years ago

Describe the bug The bookkeeper ledger no honoring the retention period and causing for the more disk usage. During my test execution i set defaultRetentionTimeInMinutes is 6 and noticed that bookie ledger is no honoring the retention period and eating up more disk space and causing for failures due to "Not enough non-faulty bookie available". When drilled down to disk and see, most of message are retaining even after retention period is completed in ledger disk. The below are the list of broker paramter settins

broker: configData: managedLedgerMinLedgerRolloverTimeMinutes: "1" managedLedgerCursorRolloverTimeInSeconds: "60" managedLedgerMaxLedgerRolloverTimeMinutes: "2" managedLedgerMaxEntriesPerLedger: "50000" defaultRetentionTimeInMinutes: "6" defaultRetentionSizeInMB: "1000" ttlDurationDefaultInSeconds: "3600"

To Reproduce Steps to reproduce the behavior:

  1. Use 2.5.0 Pulsar
  2. Create 100 topics with three partitions and 100 producers and consumers
  3. By using pulsar perf client send data for an hour continuously ( message size is 50 kb) and observe whether disk is flushing out or not after completion of the retention period
  4. I got the below ERROR message in broker logs "Not enough non-faulty bookie available"

Expected behavior As per retentionperiod the data should get flushed out and make disk space free

Screenshots n/a

Desktop (please complete the following information):

Additional context Currently we have 100 GB for each bookie Below is the disk calculation as per my messages/sec. a=msg/sec -844.023/sec=844.02350 kb =42201.2 msgs/sec in KB b=aretention period = 42201.2*360(6 minutes)=15192414 msg/sec =14.4 GB (3 bookies) is required for 6 minutes duration. c= 100 GB each bookie (300 GB for 3 bookies) so b<c

as per above calculation. i have enough disk space but still facing issues

sijie commented 4 years ago

Currently, the bookkeeper uses a lazy garbage collection mechanism to reclaim disk spaces after ledgers/segments are deleted. So there will be a gap between ledgers are deleted and the disk space is reclaimed. So please use large disk capacity if you are sending fair amount of traffic to the cluster. This allows the bookie garbage collector has enough time to reclaim the disk space.

orbang commented 4 years ago

We are experiencing a similar issue, with huge disk space usage even when we explicitly delete all the topics. Normally, we limit both the backlog quota and the retention period. However, it does not seem to have much impact. Is there any way to trigger garbage collection or any other way to reclaim disk space or set a bound on it? Thanks

sijie commented 4 years ago

@orbang Can you describe more about what you are experiencing?

jiazhai commented 4 years ago

@prasad-reddyk @orbang As sijie mentioned, how about try to set the gc to be more frequently in bookkeeper.conf? The default gc time is 1 hour and 1 day:

# Threshold of minor compaction
# For those entry log files whose remaining size percentage reaches below
# this threshold will be compacted in a minor compaction.
# If it is set to less than zero, the minor compaction is disabled.
minorCompactionThreshold=0.2

# Interval to run minor compaction, in seconds
# If it is set to less than zero, the minor compaction is disabled.
minorCompactionInterval=3600

# Threshold of major compaction
# For those entry log files whose remaining size percentage reaches below
# this threshold will be compacted in a major compaction.
# Those entry log files whose remaining size percentage is still
# higher than the threshold will never be compacted.
# If it is set to less than zero, the minor compaction is disabled.
majorCompactionThreshold=0.5

# Interval to run major compaction, in seconds
# If it is set to less than zero, the major compaction is disabled.
majorCompactionInterval=86400
prasad-reddyk commented 4 years ago

@jiazhai , Currently i set one minute of gcWaitTime as recommended bu Sijie. Please Let us know if need to change anthing around these parameter values. Thanks !!

# How long the interval to trigger next garbage collection, in milliseconds
# Since garbage collection is running in background, too frequent gc
# will heart performance. It is better to give a higher number of gc
# interval if there is enough disk capacity.
gcWaitTime=60000

# If it is set to less than zero, the minor compaction is disabled.
minorCompactionThreshold=0.2

# Interval to run minor compaction, in seconds
# If it is set to less than zero, the minor compaction is disabled.
# Note: should be greater than gcWaitTime.
minorCompactionInterval=3600  //1 Hour
frankjkelly commented 4 years ago

I am affected by this also - we're trying to reclaim disk for two reasons 1) Security / customer requirements to NOT retain data 2) Cost

What I have done so far inspired by @addisonj i) Lowered majorCompactionInterval from 86400 (1 day) to 7200 (2 hours) ii) Lowered managedLedgerMaxLedgerRolloverTimeMinutes from 240 (4 hours) to 90 (1.5 hours)

Open question on whether I should change majorCompactionThreshold from the (Pulsar) default of 0.5 as I don't fully understand if lower values mean more or less disk space usage.

Thanks in advance