apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.17k stars 3.57k forks source link

pulsar performance help #9484

Closed xiaotongwang1 closed 1 year ago

xiaotongwang1 commented 3 years ago

VM Server config

1、9 broker with 16C 128G 2、9 bookie with 16C128G and 9 SSD disk * 500G data1,data2,data3 for journal,and data4、data5、data6、data7、data8、data9 for ledgers 3、5 zookeeper with 8C32G

TPS Now

1、1 topic with 500 partitions 2、1500 producer thread 3、1 subscribeName and init 50 pulsar Consumers 4、message size is 100 5、producer TPS is 457K, avg time 9 ms ,max time 500+ms 6、consumer TPS 16916 with some may error log 2021-02-04 21:55:09,611 [pulsar-client-io-2-1] WARN (org.apache.pulsar.client.impl.BinaryProtoLookupService:189) - [persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] failed to get Partitioned metadata : 50689 lookup request timedout after ms 30000 org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 50689 lookup request timedout after ms 30000 at org.apache.pulsar.client.impl.ClientCnx.lambda$addPendingLookupRequests$8(ClientCnx.java:559) ~[pulsar-client-original-2.7.0.jar:2.7.0] at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) [netty-transport-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.47.Final.jar:4.1.47.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262] 2021-02-04 21:55:09,612 [pulsar-client-io-2-1] WARN (org.apache.pulsar.client.impl.BinaryProtoLookupService:189) - [persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] failed to get Partitioned metadata : 50698 lookup request timedout after ms 30000 org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 50698 lookup request timedout after ms 30000 at org.apache.pulsar.client.impl.ClientCnx.lambda$addPendingLookupRequests$8(ClientCnx.java:559) ~[pulsar-client-original-2.7.0.jar:2.7.0] at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) [netty-transport-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.47.Final.jar:4.1.47.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262] 2021-02-04 21:55:09,712 [pulsar-external-listener-3-1] WARN (org.apache.pulsar.client.impl.PulsarClientImpl:720) - [topic: persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] Could not get connection while getPartitionedTopicMetadata -- Will try again in 100 ms 2021-02-04 21:55:09,713 [pulsar-external-listener-3-1] WARN (org.apache.pulsar.client.impl.PulsarClientImpl:720) - [topic: persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] Could not get connection while getPartitionedTopicMetadata -- Will try again in 100 ms 2021-02-04 21:55:30,242 [pulsar-client-io-2-1] WARN (org.apache.pulsar.client.impl.BinaryProtoLookupService:189) - [persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] failed to get Partitioned metadata : 50707 lookup request timedout after ms 30000 org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 50707 lookup request timedout after ms 30000 at org.apache.pulsar.client.impl.ClientCnx.lambda$addPendingLookupRequests$8(ClientCnx.java:559) ~[pulsar-client-original-2.7.0.jar:2.7.0] at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) [netty-transport-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.47.Final.jar:4.1.47.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262] 2021-02-04 21:55:30,432 [pulsar-external-listener-3-1] WARN (org.apache.pulsar.client.impl.PulsarClientImpl:720) - [topic: persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] Could not get connection while getPartitionedTopicMetadata -- Will try again in 188 ms 2021-02-04 21:55:59,778 [pulsar-client-io-2-1] WARN (org.apache.pulsar.client.impl.BinaryProtoLookupService:189) - [persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] failed to get Partitioned metadata : 50716 lookup request timedout after ms 30000 org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 50716 lookup request timedout after ms 30000 at org.apache.pulsar.client.impl.ClientCnx.lambda$addPendingLookupRequests$8(ClientCnx.java:559) ~[pulsar-client-original-2.7.0.jar:2.7.0] at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) [netty-transport-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.47.Final.jar:4.1.47.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.47.Final.jar:4.1.47.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262] 2021-02-04 21:55:59,879 [pulsar-external-listener-3-1] WARN (org.apache.pulsar.client.impl.PulsarClientImpl:720) - [topic: persistent://MusicUserService/MusicUserTopicService/dmq.zqq.test.500p] Could not get connection while getPartitionedTopicMetadata -- Will try again in 100 ms

image

Broker JVM config

some jvm properties

-Djute.maxbuffer=10485760 -Djava.net.preferIPv4Stack=true -Dpulsar.allocator.exit_on_oom=true -Dio.netty.recycler.maxCapacity.default=1000 -Dio.netty.recycler.linkCapacity=1024 -Xms10g -Xmx10g -XX:MaxDirectMemorySize=20g -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB

broker.conf

`clusterName=dmq2-performance-test superUserRoles=101052529,pulsarAdmin brokerClientAuthenticationParameters={"credential":"pulsarAdmin", "secret":*****","appid":"101052529","appsecret":"****"} bookkeeperNumberOfChannelsPerBookie=64 limitPrometheusClientIps=127.0.0.1,10.31.4.61 maxMessageSize=20971520 dispatcherMaxReadSizeBytes=20971520 systemTopicEnabled=false topicLevelPoliciesEnabled=false zookeeperServers=10.33.141.111:2281,10.33.141.45:2281,10.33.141.138:2281,10.33.141.149:2281,10.33.141.240:2281 globalZookeeperServers= configurationStoreServers=10.33.141.111:2281,10.33.141.45:2281,10.33.141.138:2281,10.33.141.149:2281,10.33.141.240:2281 brokerServicePort=6650 brokerServicePortTls= webServicePort=8080 webServicePortTls=8443 bindAddress=0.0.0.0 advertisedAddress=ip keepAliveIntervalSeconds=30 brokerDeduplicationEnabled=false managedLedgerDefaultEnsembleSize=3 managedLedgerDefaultWriteQuorum=3 managedLedgerDefaultAckQuorum=2 managedLedgerNumWorkerThreads=8 managedLedgerNumSchedulerThreads=8 defaultRetentionTimeInMinutes=10080 defaultRetentionSizeInMB=0 failureDomainsEnabled=false bookkeeperClientTimeoutInSeconds=30 zooKeeperSessionTimeoutMillis=30000 zooKeeperOperationTimeoutSeconds=30 zooKeeperCacheExpirySeconds=300 bookkeeperClientRackawarePolicyEnabled=true bookkeeperClientRegionawarePolicyEnabled=false exposeTopicLevelMetricsInPrometheus=true exposeConsumerLevelMetricsInPrometheus=false exposePublisherStats=true statsUpdateFrequencyInSecs=60 statsUpdateInitialDelayInSecs=60 exposePreciseBacklogInPrometheus=false brokerShutdownTimeoutMs=60000 skipBrokerShutdownOnOOM=false backlogQuotaCheckEnabled=true backlogQuotaCheckIntervalInSeconds=60 backlogQuotaDefaultLimitGB=-1 backlogQuotaDefaultRetentionPolicy=producer_exception ttlDurationDefaultInSeconds=604800 allowAutoTopicCreation=false allowAutoTopicCreationType=partitioned allowAutoSubscriptionCreation=false defaultNumPartitions=1 brokerDeleteInactiveTopicsEnabled=false brokerDeleteInactiveTopicsFrequencySeconds=60 brokerDeleteInactiveTopicsMode=delete_when_no_subscriptions messageExpiryCheckIntervalInMinutes=5 activeConsumerFailoverDelayTimeMillis=1000 subscriptionExpirationTimeMinutes=0 subscriptionRedeliveryTrackerEnabled=true subscriptionExpiryCheckIntervalInMinutes=5 subscriptionKeySharedEnable=true subscriptionKeySharedUseConsistentHashing=false subscriptionKeySharedConsistentHashingReplicaPoints=100 brokerDeduplicationMaxNumberOfProducers=10000 brokerDeduplicationEntriesInterval=1000 brokerDeduplicationProducerInactivityTimeoutMinutes=360 defaultNumberOfNamespaceBundles=4 clientLibraryVersionCheckEnabled=false preferLaterVersions=false maxUnackedMessagesPerConsumer=50000 maxUnackedMessagesPerSubscription=200000 maxUnackedMessagesPerBroker=0 maxUnackedMessagesPerSubscriptionOnBrokerBlocked=0.16 topicPublisherThrottlingTickTimeMillis=10 brokerPublisherThrottlingTickTimeMillis=50 brokerPublisherThrottlingMaxMessageRate=0 brokerPublisherThrottlingMaxByteRate=0 subscribeThrottlingRatePerConsumer=0 subscribeRatePeriodPerConsumerInSecond=30 dispatchThrottlingRatePerTopicInMsg=0 dispatchThrottlingRatePerTopicInByte=0 dispatchThrottlingRatePerSubscriptionInMsg=0 dispatchThrottlingRatePerSubscriptionInByte=0 dispatchThrottlingRatePerReplicatorInMsg=0 dispatchThrottlingRatePerReplicatorInByte=0 dispatchThrottlingRateRelativeToPublishRate=false dispatchThrottlingOnNonBacklogConsumerEnabled=true dispatcherMaxReadBatchSize=100 dispatcherMinReadBatchSize=1 dispatcherMaxRoundRobinBatchSize=20 preciseDispatcherFlowControl=false maxConcurrentLookupRequest=50000 maxConcurrentTopicLoadRequest=5000 maxConcurrentNonPersistentMessagePerConnection=1000 numWorkerThreadsForNonPersistentTopic=8 enablePersistentTopics=true enableNonPersistentTopics=true enableRunBookieTogether=false enableRunBookieAutoRecoveryTogether=false maxProducersPerTopic=0 maxConsumersPerTopic=0 maxConsumersPerSubscription=0 brokerServiceCompactionMonitorIntervalInSeconds=60 delayedDeliveryEnabled=true delayedDeliveryTickTimeMillis=1000 acknowledgmentAtBatchIndexLevelEnabled=false enableReplicatedSubscriptions=true replicatedSubscriptionsSnapshotFrequencyMillis=1000 replicatedSubscriptionsSnapshotTimeoutSeconds=30 replicatedSubscriptionsSnapshotMaxCachedPerSubscription=10 messagePublishBufferCheckIntervalInMillis=100 retentionCheckIntervalInSeconds=120 maxNumPartitionsPerPartitionedTopic=0 zookeeperSessionExpiredPolicy=shutdown authenticateOriginalAuthData=false tlsEnabled=false tlsCertRefreshCheckDurationSec=300

authenticationEnabled=true authenticationProviders=com.huawei.dmq2.security.dmq.broker.server.AuthenticationProviderSCRAM authenticationRefreshCheckSeconds=60 authorizationEnabled=true authorizationProvider=org.apache.pulsar.broker.authorization.PulsarAuthorizationProvider authorizationAllowWildcardsMatching=false brokerClientTlsEnabled=false brokerClientAuthenticationPlugin=com.huawei.dmq2.security.dmq.broker.client.AuthenticationSCRAM saslJaasClientAllowedIds=.* saslJaasBrokerSectionName=PulsarBroker httpMaxRequestSize=-1 bookkeeperMetadataServiceUri= bookkeeperClientAuthenticationPlugin=com.huawei.dmq2.security.dmq.bookie.client.SASLClientProviderFactory bookkeeperClientSpeculativeReadTimeoutInMillis=0 bookkeeperUseV2WireProtocol=true bookkeeperClientHealthCheckEnabled=true bookkeeperClientHealthCheckIntervalSeconds=60 bookkeeperClientHealthCheckErrorThresholdPerInterval=5 bookkeeperClientHealthCheckQuarantineTimeInSeconds=1800 bookkeeperGetBookieInfoIntervalSeconds=86400 bookkeeperGetBookieInfoRetryIntervalSeconds=60 bookkeeperClientReorderReadSequenceEnabled=false

bookkeeperEnableStickyReads=false bookkeeperDiskWeightBasedPlacementEnabled=false bookkeeperExplicitLacIntervalInMills=0 managedLedgerDigestType=CRC32C managedLedgerCacheCopyEntries=false managedLedgerCacheEvictionWatermark=0.9 managedLedgerCacheEvictionFrequency=100.0 managedLedgerCacheEvictionTimeThresholdMillis=1000 managedLedgerCursorBackloggedThreshold=1000 managedLedgerDefaultMarkDeleteRateLimit=1.0 managedLedgerMaxEntriesPerLedger=50000 managedLedgerMinLedgerRolloverTimeMinutes=10 managedLedgerMaxLedgerRolloverTimeMinutes=240 managedLedgerMaxSizePerLedgerMbytes=2048 managedLedgerOffloadDeletionLagMs=14400000 managedLedgerOffloadAutoTriggerSizeThresholdBytes=-1 managedLedgerCursorMaxEntriesPerLedger=50000 managedLedgerCursorRolloverTimeInSeconds=14400 managedLedgerMaxUnackedRangesToPersist=10000 managedLedgerMaxUnackedRangesToPersistInZooKeeper=1000 autoSkipNonRecoverableData=false managedLedgerMetadataOperationsTimeoutSeconds=60 managedLedgerReadEntryTimeoutSeconds=0 managedLedgerAddEntryTimeoutSeconds=0 managedLedgerPrometheusStatsLatencyRolloverSeconds=60 managedLedgerTraceTaskExecution=true managedLedgerNewEntriesCheckDelayInMillis=10 loadBalancerEnabled=true loadBalancerReportUpdateThresholdPercentage=10 loadBalancerReportUpdateMaxIntervalMinutes=15 loadBalancerHostUsageCheckIntervalMinutes=1 loadBalancerSheddingEnabled=true loadBalancerSheddingIntervalMinutes=1 loadBalancerSheddingGracePeriodMinutes=30 loadBalancerBrokerMaxTopics=50000 loadBalancerBrokerOverloadedThresholdPercentage=85 loadBalancerResourceQuotaUpdateIntervalMinutes=15 loadBalancerAutoBundleSplitEnabled=true loadBalancerAutoUnloadSplitBundlesEnabled=true loadBalancerNamespaceBundleMaxTopics=1000 loadBalancerNamespaceBundleMaxSessions=1000 loadBalancerNamespaceBundleMaxMsgRate=30000 loadBalancerNamespaceBundleMaxBandwidthMbytes=100 loadBalancerNamespaceMaximumBundles=128 loadBalancerOverrideBrokerNicSpeedGbps= loadManagerClassName=org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl supportedNamespaceBundleSplitAlgorithms=range_equally_divide,topic_count_equally_divide defaultNamespaceBundleSplitAlgorithm=range_equally_divide loadBalancerLoadSheddingStrategy=org.apache.pulsar.broker.loadbalance.impl.OverloadShedder loadBalancerBrokerThresholdShedderPercentage=10 loadBalancerHistoryResourcePercentage=0.9 loadBalancerBandwithInResourceWeight=1.0 loadBalancerBandwithOutResourceWeight=1.0 loadBalancerCPUResourceWeight=1.0 loadBalancerMemoryResourceWeight=1.0 loadBalancerDirectMemoryResourceWeight=1.0 loadBalancerBundleUnloadMinThroughputThreshold=10 replicationMetricsEnabled=true replicationConnectionsPerBroker=16 replicationProducerQueueSize=1000 replicatorPrefix=pulsar.repl replicatioPolicyCheckDurationSeconds=600 bootstrapNamespaces= webSocketServiceEnabled=false webSocketNumIoThreads=8 webSocketConnectionsPerBroker=8 webSocketSessionIdleTimeoutMillis=300000 webSocketMaxTextFrameSize=1048576 functionsWorkerEnabled=false schemaRegistryStorageClassName=org.apache.pulsar.broker.service.schema.BookkeeperSchemaStorageFactory isSchemaValidationEnforced=false managedLedgerOffloadDriver= managedLedgerOffloadMaxThreads=2 managedLedgerOffloadPrefetchRounds=1 managedLedgerUnackedRangesOpenCacheSetEnabled=true `

broker ERROR OutOfDirectMemoryError

ce MusicUserService/MusicUserTopicService has too many bundles: 128 2021-02-04 21:46:18.563 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x70000000_0x80000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128 2021-02-04 21:46:18.563 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x00000000_0x10000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128 2021-02-04 21:46:18.563 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x18000000_0x20000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128 2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0xa0000000_0xc0000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128 2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x90000000_0xa0000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128 2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0xc0000000_0xdfffffff because namespace MusicUserService/MusicUserTopicService has too many bundles: 128 2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0xdfffffff_0xefffffff because namespace MusicUserService/MusicUserTopicService has too many bundles: 128 2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x20000000_0x40000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128 2021-02-04 21:46:18.564 [pulsar-modular-load-manager-37-1] WARN o.a.pulsar.broker.loadbalance.BundleSplitStrategy - Could not split namespace bundle MusicUserService/MusicUserTopicService/0x40000000_0x60000000 because namespace MusicUserService/MusicUserTopicService has too many bundles: 128

Bookie JVM config

-Xms20g -Xmx20g -XX:MaxDirectMemorySize=40g -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.maxCapacity.default=1000 -Dio.netty.recycler.linkCapacity=1024

Bookie config

bookiePort=3181 journalDirectory=/opt/huawei/data1/journal,/opt/huawei/data2/journal,/opt/huawei/data3/journal ledgerDirectories=/opt/huawei/data4/ledgers,/opt/huawei/data5/ledgers,/opt/huawei/data6/ledgers,/opt/huawei/data7/ledgers,/opt/huawei/data8/ledgers,/opt/huawei/data9/ledgers zkServers=10.33.141.111:2281,10.33.141.45:2281,10.33.141.138:2281,10.33.141.149:2281,10.33.141.240:2281 zkTimeout=60000 zkEnableSecurity=true journalSyncData=true statsProviderClass=com.huawei.dmq2.security.dmq.bookie.metrics.PrometheusMetricsProvider prometheusStatsHttpHost=0.0.0.0 prometheusStatsHttpPort=8000 dbStorage_writeCacheMaxSizeMb=30000 dbStorage_readAheadCacheMaxSizeMb= dbStorage_readAheadCacheBatchSize=1000 dbStorage_rocksDB_blockCacheSize= dbStorage_rocksDB_writeBufferSizeMB=64 dbStorage_rocksDB_sstSizeInMB=64 dbStorage_rocksDB_blockSize=65536 dbStorage_rocksDB_bloomFilterBitsPerKey=10 dbStorage_rocksDB_numLevels=-1 dbStorage_rocksDB_numFilesInLevel0=4 dbStorage_rocksDB_maxSizeInLevel1MB=256 ledgerStorageClass=org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage minUsableSizeForIndexFileCreation=1073741824 advertisedAddress= allowLoopback=false bookieDeathWatchInterval=1000 flushInterval=60000 useHostNameAsBookieID=false bookieAuthProviderFactoryClass=com.huawei.dmq2.security.dmq.bookie.server.SASLBookieAuthProviderFactory clientAuthProviderFactoryClass=com.huawei.dmq2.security.dmq.bookie.client.SASLClientProviderFactory gcWaitTime=900000 gcOverreplicatedLedgerWaitTime=86400000 numAddWorkerThreads=0 numReadWorkerThreads=8 numHighPriorityWorkerThreads=8 maxPendingReadRequestsPerThread=2500 maxPendingAddRequestsPerThread=10000 auditorPeriodicBookieCheckInterval=86400 rereplicationEntryBatchSize=100 openLedgerRereplicationGracePeriod=30000 autoRecoveryDaemonEnabled=true lostBookieRecoveryDelay=0 serverTcpNoDelay=true nettyMaxFrameSizeBytes=5253120 journalMaxSizeMB=2048 journalMaxBackups=5 journalPreAllocSizeMB=16 journalWriteBufferSizeKB=64 journalRemoveFromPageCache=true journalAdaptiveGroupWrites=true journalMaxGroupWaitMSec=1 journalBufferedWritesThreshold=524288 numJournalCallbackThreads=8 journalAlignmentSize=4096 journalFlushWhenQueueEmpty=false auditorPeriodicCheckInterval=604800 openFileLimit=0 pageLimit=0 zkLedgersRootPath=/ledgers logSizeLimit=1073741824 entryLogFilePreallocationEnabled=true flushEntrylogBytes=268435456 readBufferSizeBytes=4096 writeBufferSizeBytes=65536 compactionRate=1000 minorCompactionThreshold=0.2 minorCompactionInterval=3600 compactionMaxOutstandingRequests=100000 majorCompactionThreshold=0.5 majorCompactionInterval=86400 isThrottleByBytes=false compactionRateByEntries=1000 compactionRateByBytes=1000000 readOnlyModeEnabled=true diskUsageThreshold=0.95 diskCheckInterval=10000 httpServerEnabled=false httpServerPort=8000 httpServerClass=org.apache.bookkeeper.http.vertx.VertxHttpServer

Bookie ERROR OutOfDirectMemoryError

2021-02-04 21:47:38.066 [bookie-io-1-28] ERROR org.apache.bookkeeper.proto.BookieRequestHandler - Unhandled exception occurred in I/O thread or handler on [id: 0x6a9cf1ad, L:/10.33.141.145:3181 - R:/10.33.141.26:55134] io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 42949672956, max: 42949672960) at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:754) at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:709) at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:755) at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:731) at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:247) at io.netty.buffer.PoolArena.allocate(PoolArena.java:227) at io.netty.buffer.PoolArena.reallocate(PoolArena.java:394) at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:118) at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:306) at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:282) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1104) at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:99) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:387) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)

Zookeeper JVM config

-Xmx1524M -Xms1524M -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15

Zookeeper config

dataDir=/opt/huawei/data1/zookeeperdata clientPort=2181 secureClientPort=2281 maxClientCnxns=100 tickTime=2000 initLimit=10 syncLimit=5 autopurge.snapRetainCount=3 autopurge.purgeInterval=1

authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider requireClientAuthScheme=sasl jaasLoginRenew=3600000

admin.enableServer=false

quorum.auth.enableSasl=true quorum.auth.learnerRequireSasl=true quorum.auth.serverRequireSasl=true quorum.auth.learner.loginContext=QuorumLearner quorum.auth.server.loginContext=QuorumServer quorum.cnxn.threads.size=20

4lw.commands.whitelist==stat,ruok,mntr,stat

forceSync=yes

clientPortAddress=127.0.0.1 secureClientPortAddress=10.33.141.138 server.1=10.33.141.111:2888:3888 server.2=10.33.141.45:2888:3888 server.3=10.33.141.138:2888:3888 server.4=10.33.141.149:2888:3888 server.5=10.33.141.240:2888:3888

xiaotongwang1 commented 3 years ago

@codelipenghui can you help check the config and error log ,thanks

xiaotongwang1 commented 3 years ago

[dmq@host-10-33-141-93 arthas]$ jmap -histo:live 20135|head -n 100

num #instances #bytes class name

1: 2728 2908059496 [J 2: 10303 35648096 [B 3: 12900 13121224 [Ljava.lang.Object; 4: 55838 5813320 [C 5: 5226 3428256 io.netty.util.internal.shaded.org.jctools.queues.MpscArrayQueue 6: 1525 3106944 [D 7: 967 2387056 [Lio.netty.util.Recycler$DefaultHandle; 8: 5336 1508352 [I 9: 55382 1329168 java.lang.String 10: 5337 854992 [Ljava.util.HashMap$Node; 11: 7141 803024 java.lang.Class 12: 20049 641568 java.util.HashMap$Node 13: 12251 392032 java.util.concurrent.ConcurrentHashMap$Node 14: 5515 308840 org.apache.bookkeeper.client.LedgerFragment 15: 7630 305200 java.util.LinkedHashMap$Entry 16: 4117 296424 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask 17: 9211 294752 io.netty.util.Recycler$DefaultHandle 18: 162 290336 [Lio.netty.buffer.PoolSubpage; 19: 5516 264768 java.util.HashMap 20: 8274 264768 java.util.Hashtable$Entry 21: 2137 188056 java.lang.reflect.Method 22: 2786 178304 io.netty.buffer.PoolSubpage 23: 5131 164192 io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry 24: 2885 161560 io.netty.channel.DefaultChannelHandlerContext 25: 9819 157104 java.lang.Object 26: 4824 154368 io.netty.buffer.PoolThreadCache$SubPageMemoryRegionCache 27: 3304 132160 org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorageDataFormats$LedgerData 28: 5433 130392 java.util.jar.Attributes$Name 29: 301 106320 [Ljava.util.concurrent.ConcurrentHashMap$Node; 30: 1300 104000 org.apache.bookkeeper.proto.PerChannelBookieClient$AddCompletion 31: 1518 97152 com.yahoo.sketches.quantiles.HeapDoublesSketch 32: 1519 85064 java.security.Provider$Service 33: 3364 80736 com.google.protobuf.ByteString$LiteralByteString 34: 136 77248 io.netty.util.internal.shaded.org.jctools.queues.MpscUnboundedArrayQueue 35: 3133 73544 [Ljava.lang.Class; 36: 570 72136 [Ljava.lang.String; 37: 1000 72000 java.lang.reflect.Field 38: 4457 71312 java.util.jar.Attributes 39: 2971 71304 java.security.Provider$ServiceKey 40: 1669 66760 java.util.WeakHashMap$Entry 41: 60 64688 [Ljava.util.Hashtable$Entry; 42: 27 64440 [Ljava.util.concurrent.RunnableScheduledFuture; 43: 29 62008 [Ljava.nio.ByteBuffer; 44: 859 54976 java.util.concurrent.ConcurrentHashMap 45: 680 54400 java.lang.reflect.Constructor 46: 2048 49152 io.netty.util.HashedWheelTimer$HashedWheelBucket 47: 768 49152 org.apache.bookkeeper.util.collections.ConcurrentLongLongPairHashMap$Section 48: 501 48032 [Ljava.util.WeakHashMap$Entry; 49: 122 46848 io.netty.util.concurrent.FastThreadLocalThread 50: 363 46464 io.netty.channel.epoll.EpollSocketChannel 51: 722 46208 sun.security.provider.SHA2$SHA256 52: 1366 43712 org.apache.bookkeeper.proto.PerChannelBookieClient$V3CompletionKey 53: 1300 41600 org.apache.bookkeeper.client.LedgerFragmentReplicator$2 54: 1277 40864 sun.security.util.ObjectIdentifier 55: 2458 39328 java.util.concurrent.atomic.AtomicBoolean 56: 694 38864 java.lang.invoke.MemberName 57: 1204 38528 java.util.concurrent.atomic.LongAdder 58: 931 37240 java.lang.ref.SoftReference 59: 1530 36720 java.lang.Long 60: 553 35392 java.net.URL 61: 2210 35360 java.util.concurrent.atomic.AtomicInteger 62: 873 34920 java.lang.ref.Finalizer 63: 122 34160 java.util.concurrent.atomic.Striped64$Cell 64: 518 33152 io.netty.util.Recycler$Stack 65: 690 33120 java.util.concurrent.locks.StampedLock 66: 1326 31824 java.util.ArrayList 67: 361 31768 io.netty.handler.codec.LengthFieldBasedFrameDecoder 68: 773 30920 java.math.BigInteger 69: 519 29064 java.lang.Class$ReflectionData 70: 436 27904 io.netty.channel.ChannelOutboundBuffer$Entry 71: 402 27872 [Lio.netty.buffer.PoolThreadCache$MemoryRegionCache; 72: 659 26360 java.lang.invoke.MethodType 73: 821 26272 sun.security.util.DerInputBuffer 74: 821 26272 sun.security.util.DerValue 75: 364 26208 io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl 76: 396 25344 org.apache.bookkeeper.util.collections.ConcurrentLongLongHashMap$Section 77: 450 25200 io.netty.util.Recycler$WeakOrderQueue 78: 768 24576 io.netty.handler.codec.CodecOutputList 79: 336 24192 org.apache.bookkeeper.util.collections.ConcurrentLongHashMap$Section 80: 501 24048 java.util.WeakHashMap 81: 742 23744 java.net.InetAddress$InetAddressHolder 82: 367 23488 java.security.SecureRandom 83: 244 23424 java.util.jar.JarFile$JarFileEntry 84: 364 23296 io.netty.channel.ChannelOutboundBuffer 85: 364 23296 io.netty.channel.DefaultChannelPipeline$HeadContext 86: 363 23232 io.netty.channel.epoll.EpollSocketChannelConfig 87: 725 23200 java.security.MessageDigest$Delegate 88: 552 22080 java.util.TreeMap$Entry 89: 663 21216 java.lang.invoke.MethodType$ConcurrentWeakInternSet$WeakEntry 90: 260 20992 [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry; 91: 863 20712 java.util.LinkedList$Node 92: 369 20664 sun.nio.cs.UTF_8$Encoder 93: 364 20384 io.netty.channel.DefaultChannelPipeline$TailContext 94: 363 20328 io.netty.channel.epoll.EpollSocketChannel$EpollSocketChannelUnsafe 95: 499 19960 java.util.concurrent.locks.StampedLock$WNode 96: 311 19904 org.apache.bookkeeper.bookie.Journal$QueueEntry 97: 821 19704 sun.security.util.DerInputStream

frankjkelly commented 3 years ago

I could be misreading this but if you have 9 broker with 16C 128G then thats 128G/9 = 14.2G per Broker but you have configured

-Xmx10g
-XX:MaxDirectMemorySize=20g

which requires 30g (at least) per broker. Am I misunderstanding?

codelipenghui commented 2 years ago

The issue had no activity for 30 days, mark with Stale label.

tisonkun commented 1 year ago

Closed as stale. Please open a new issue if it's still relevant in maintained versions.

Please ask questions on https://github.com/apache/pulsar/discussions/categories/q-a. And please upload logs as a file and only inline significant lines. It's a discourage hint for potential volunteer help debugging when they saw such a large inline content.