apache / bookkeeper

Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
https://bookkeeper.apache.org/
Apache License 2.0
1.91k stars 903 forks source link

Not enough non-faulty bookies available when sending a message that exceeds defaultMaxMessageSize #2357

Open wolfstudy opened 4 years ago

wolfstudy commented 4 years ago

BUG REPORT

Link the issue to: https://github.com/apache/pulsar/issues/4525

Describe the bug

All configurations are the default configuration.

After starting, send a message that exceeds defaultMaxMessageSize, pulsar's broker is unavailable.

To Reproduce

test code of go:

func main() {
    client, err := pulsar.NewClient(pulsar.ClientOptions{
        URL:       "pulsar://localhost:6650",
        IOThreads: 5,
    })

    if err != nil {
        log.Fatal(err)
    }

    defer client.Close()

    producer, err := client.CreateProducer(pulsar.ProducerOptions{
        Topic:           "my-topic",
        CompressionType: pulsar.NoCompression,
        Batching:        false,
    })
    if err != nil {
        log.Fatal(err)
    }

    defer producer.Close()

    ctx := context.Background()
    buf := make([]byte, 10*1024*1024)
    arrays := [1024*1024*5]byte{}
    buf = arrays[:]
    fmt.Printf("the buf length is:[%d]\n", len(buf))

    producer.SendAsync(ctx, pulsar.ProducerMessage{
        Payload: buf,
    }, func(producerMessage pulsar.ProducerMessage, e error) {
        fmt.Printf("send complete. err=%v,\n", e)
    })
}

broker error as follows:

16:37:42.874 [BookKeeperClientWorker-OrderedExecutor-1-0] WARN  org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (7213, 0): Bookie handle is not available
16:37:42.874 [BookKeeperClientWorker-OrderedExecutor-1-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:127.0.0.1:3181>], allBookies [<Bookie:127.0.0.1:3181>].
16:37:42.874 [BookKeeperClientWorker-OrderedExecutor-1-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to choose a bookie: excluded [<Bookie:127.0.0.1:3181>], fallback to choose bookie randomly from the cluster.
16:37:42.874 [BookKeeperClientWorker-OrderedExecutor-1-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:127.0.0.1:3181>], allBookies [<Bookie:127.0.0.1:3181>].
16:37:42.874 [BookKeeperClientWorker-OrderedExecutor-1-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:127.0.0.1:3181>], allBookies [<Bookie:127.0.0.1:3181>].
16:37:42.874 [BookKeeperClientWorker-OrderedExecutor-1-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to choose a bookie: excluded [<Bookie:127.0.0.1:3181>], fallback to choose bookie randomly from the cluster.
16:37:42.874 [BookKeeperClientWorker-OrderedExecutor-1-0] WARN  org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:127.0.0.1:3181>], allBookies [<Bookie:127.0.0.1:3181>].
16:37:42.874 [BookKeeperClientWorker-OrderedExecutor-1-0] ERROR org.apache.bookkeeper.client.MetadataUpdateLoop - UpdateLoop(ledgerId=7213,loopId=3f741063) Exception updating
org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available
    at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectRandomInternal(RackawareEnsemblePlacementPolicyImpl.java:989) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectRandom(RackawareEnsemblePlacementPolicyImpl.java:907) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectFromNetworkLocation(RackawareEnsemblePlacementPolicyImpl.java:797) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.selectFromNetworkLocation(RackawareEnsemblePlacementPolicy.java:200) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.selectFromNetworkLocation(RackawareEnsemblePlacementPolicyImpl.java:757) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.selectFromNetworkLocation(RackawareEnsemblePlacementPolicy.java:221) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.replaceBookie(RackawareEnsemblePlacementPolicyImpl.java:659) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.replaceBookie(RackawareEnsemblePlacementPolicy.java:114) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.client.BookieWatcherImpl.replaceBookie(BookieWatcherImpl.java:295) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.client.EnsembleUtils.replaceBookiesInEnsemble(EnsembleUtils.java:71) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.client.LedgerHandle.lambda$ensembleChangeLoop$2(LedgerHandle.java:1908) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.client.MetadataUpdateLoop.writeLoop(MetadataUpdateLoop.java:122) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.client.MetadataUpdateLoop.run(MetadataUpdateLoop.java:111) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.client.LedgerHandle.ensembleChangeLoop(LedgerHandle.java:1927) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.client.LedgerHandle.handleBookieFailure(LedgerHandle.java:1876) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:360) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.proto.PerChannelBookieClient$AddCompletion.writeComplete(PerChannelBookieClient.java:2028) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.proto.PerChannelBookieClient$AddCompletion.lambda$errorOut$0(PerChannelBookieClient.java:2051) ~[bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.proto.PerChannelBookieClient$CompletionValue$1.safeRun(PerChannelBookieClient.java:1606) [bookkeeper-server-4.9.2.jar:4.9.2]
    at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [bookkeeper-common-4.9.2.jar:4.9.2]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_201]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_201]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.32.Final.jar:4.1.32.Final]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
16:37:42.874 [BookKeeperClientWorker-OrderedExecutor-1-0] WARN  org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (7213, 1): Bookie handle is not available
16:37:42.874 [BookKeeperClientWorker-OrderedExecutor-1-0] WARN  org.apache.bookkeeper.client.LedgerHandle - [EnsembleChange(ledger:7213, change-id:0000000001)][attempt:1] Exception changing ensemble

Expected behavior

When I send a message that exceeds the maximum message allowed by pulsar, I expect pulsar's broker to fail the message instead of causing the entire broker to be unavailable.

Screenshots

If applicable, add screenshots to help explain your problem image

Additional context

Add any other context about the problem here.

eolivelli commented 4 years ago

The problem is that at low level we have a protocol error and the bookie is treated as broken. We should report an error earlier before trying to reach the bookie. But we cannot know the current configuration of the bookie. We could just add some max entry size on the client, probably left not configured by default.

eolivelli commented 4 years ago

You should see other errors before the ine you pasted

devinbost commented 3 years ago

@eolivelli What about adding an endpoint that would allow configurations like that to be retrieved?