apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.27k stars 1.23k forks source link

Fix the issue with "pinot-pulsar" module (potentially library conflicts) #7270

Closed snleee closed 2 years ago

snleee commented 2 years ago

Apache Pulsar connector has been added from https://github.com/apache/pinot/pull/7026

However, it currently is facing some issues on runtime (potentially dependency conflicts). We need to fix the conflicts to make the connector work correctly.

2021/08/06 10:53:35.451 ERROR [SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel] [HelixTaskExecutor-message_handle_thread] Caught exception in state transition from OFFLINE -> ONLINE for resource: myTable_REALTIME, partition: myTable__0__0__20210806T1753Z
java.lang.RuntimeException: org.apache.pulsar.shaded.com.google.protobuf.v241.InvalidProtocolBufferException: Protocol message tag had invalid wire type.
        at org.apache.pulsar.client.internal.ReflectionUtils.catchExceptions(ReflectionUtils.java:43) ~[pinot-pulsar-0.8.0-shaded.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at org.apache.pulsar.client.internal.DefaultImplementation.newMessageIdFromByteArray(DefaultImplementation.java:103) ~[pinot-pulsar-0.8.0-shaded.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at org.apache.pulsar.client.api.MessageId.fromByteArray(MessageId.java:58) ~[pinot-pulsar-0.8.0-shaded.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at org.apache.pinot.plugin.stream.pulsar.MessageIdStreamOffset.<init>(MessageIdStreamOffset.java:47) ~[pinot-pulsar-0.8.0-shaded.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at org.apache.pinot.plugin.stream.pulsar.MessageIdStreamOffsetFactory.create(MessageIdStreamOffsetFactory.java:39) ~[pinot-pulsar-0.8.0-shaded.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.<init>(LLRealtimeSegmentDataManager.java:1183) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:349) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:162) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:168) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline(SegmentOnlineOfflineStateModelFactory.java:89) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
        at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
        at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: org.apache.pulsar.shaded.com.google.protobuf.v241.InvalidProtocolBufferException: Protocol message tag had invalid wire type.
        at org.apache.pulsar.common.util.protobuf.ByteBufCodedInputStream.skipField(ByteBufCodedInputStream.java:192) ~[pinot-pulsar-0.8.0-shaded.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at org.apache.pulsar.common.api.proto.PulsarApi$MessageIdData$Builder.mergeFrom(PulsarApi.java:1602) ~[pinot-pulsar-0.8.0-shaded.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at org.apache.pulsar.client.impl.MessageIdImpl.fromByteArray(MessageIdImpl.java:106) ~[pinot-pulsar-0.8.0-shaded.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
        at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
        at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
        at org.apache.pulsar.client.internal.DefaultImplementation.lambda$newMessageIdFromByteArray$3(DefaultImplementation.java:103) ~[pinot-pulsar-0.8.0-shaded.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        at org.apache.pulsar.client.internal.ReflectionUtils.catchExceptions(ReflectionUtils.java:35) ~[pinot-pulsar-0.8.0-shaded.jar:0.8.0-a206db39710e2495f1d72adb90387617b03566d4]
        ... 21 more
pinot-server_1      | 2021/08/05 14:25:59.009 ERROR [SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel] [HelixTaskExecutor-message_handle_thread] Caught exception in state transition from OFFLINE -> ONLINE for resource: datasource_610bf4bf19200003007e55b3_REALTIME, partition: datasource_610bf4bf19200003007e55b3__0__0__20210805T1425Z
pinot-server_1      | java.lang.IndexOutOfBoundsException: readerIndex(1) + length(8) exceeds writerIndex(6): UnpooledHeapByteBuf(ridx: 1, widx: 6, cap: 6/6)
pinot-server_1      |   at shaded.io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1478) ~[pinot-azure-0.8.0-SNAPSHOT-shaded.jar:0.8.0-SNAPSHOT-4da1dae06aef50f0a7c96b5a22019e541310fdd9]
pinot-server_1      |   at shaded.io.netty.buffer.AbstractByteBuf.readLongLE(AbstractByteBuf.java:845) ~[pinot-azure-0.8.0-SNAPSHOT-shaded.jar:0.8.0-SNAPSHOT-4da1dae06aef50f0a7c96b5a22019e541310fdd9]
pinot-server_1      |   at org.apache.pulsar.common.util.protobuf.ByteBufCodedInputStream.readRawLittleEndian64(ByteBufCodedInputStream.java:309) ~[pinot-pulsar-0.8.0-SNAPSHOT-shaded.jar:0.8.0-SNAPSHOT-4da1dae06aef50f0a7c96b5a22019e541310fdd9]
pinot-server_1      |   at org.apache.pulsar.common.util.protobuf.ByteBufCodedInputStream.skipField(ByteBufCodedInputStream.java:177) ~[pinot-pulsar-0.8.0-SNAPSHOT-shaded.jar:0.8.0-SNAPSHOT-4da1dae06aef50f0a7c96b5a22019e541310fdd9]
pinot-server_1      |   at org.apache.pulsar.common.api.proto.PulsarApi$MessageIdData$Builder.mergeFrom(PulsarApi.java:1602) ~[pinot-pulsar-0.8.0-SNAPSHOT-shaded.jar:0.8.0-SNAPSHOT-4da1dae06aef50f0a7c96b5a22019e541310fdd9]
pinot-server_1      |   at org.apache.pulsar.client.impl.MessageIdImpl.fromByteArray(MessageIdImpl.java:106) ~[pinot-pulsar-0.8.0-SNAPSHOT-shaded.jar:0.8.0-SNAPSHOT-4da1dae06aef50f0a7c96b5a22019e541310fdd9]
pinot-server_1      |   at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
pinot-server_1      |   at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
pinot-server_1      |   at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
pinot-server_1      |   at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
pinot-server_1      |   at org.apache.pulsar.client.internal.DefaultImplementation.lambda$newMessageIdFromByteArray$3(DefaultImplementation.java:103) ~[pinot-pulsar-0.8.0-SNAPSHOT-shaded.jar:0.8.0-SNAPSHOT-4da1dae06aef50f0a7c96b5a22019e541310fdd9]
pinot-server_1      |   at org.apache.pulsar.client.internal.ReflectionUtils.catchExceptions(ReflectionUtils.java:35) ~[pinot-pulsar-0.8.0-SNAPSHOT-shaded.jar:0.8.0-SNAPSHOT-4da1dae06aef50f0a7c96b5a22019e541310fdd9]
pinot-server_1      |   at org.apache.pulsar.client.internal.DefaultImplementation.newMessageIdFromByteArray(DefaultImplementation.java:103) ~[pinot-pulsar-0.8.0-SNAPSHOT-shaded.jar:0.8.0-SNAPSHOT-4da1dae06aef50f0a7c96b5a22019e541310fdd9]
Eywek commented 2 years ago

Hello, Any news on this?

kishoreg commented 2 years ago

We should probably shade some of the conflicting libraries

aleksdikanski commented 2 years ago

Hi, I had a look at this issue, as I'm currently investigating pulsar and pinot integration for my current company. I first encountered the first error with the InvalidProtocolBufferException (kubernetes deployment) and later with a different setup (local docker) the IndexOutOfBoundsException.

So far, it does not look like an issue with false shaded libraries. As one can see from the stacktraces, the errors occur in libraries that are already shaded by the pulsar client lib.

I looked into the second IndexOutOfBoundsException first and think I found the issue. When constructing a MessageIdStreamOffset from a string the current implementation tries to parse a pulsar MessageId from a string provided.

  /**
   * returns the class object from string message id in the format ledgerId:entryId:partitionId
   * throws {@link IOException} if message if format is invalid.
   * @param messageId
   */
  public MessageIdStreamOffset(String messageId) {
    try {
      _messageId = MessageId.fromByteArray(messageId.getBytes(StandardCharsets.UTF_8));
    } catch (IOException e) {
      LOGGER.warn("Cannot parse message id " + messageId, e);
    }
  }

As you can see from the comment the assumption about the structure of a MessageId is made. Unfortunately the passed String is only the MessageId.toString() representation and not the actual wire transfer representation as a byte array. Hence the parsing fails. I have currently fixed this by splitting the incoming String and using the constructors of MessageIdImpl and BatchMessageIdImpl to create a MessageId from the string. With that I had no execptions any more, not even the second one, i.e., I could the airline stats example. I can create a MR for this, but I don't know if splitting and constructing the MessageId using impl classes is such a nice solution and would rather have the wire format of the MessageId passed as an input, but I have not found, were this is happening. Would be nice to get some pointers here. Thanks

kishoreg commented 2 years ago

@KKcorps can you help here?

KKcorps commented 2 years ago

Taking it up.

aleksdikanski commented 2 years ago

thanks, I linked the changes that I made for it to work for me as a draft PR, in case it helps you.

KKcorps commented 2 years ago

@aleksdikanski I remember implementing it this way because I don't want to make any assumptions regarding message-id format in the pinot code. The pulsar team can change the message-id format but our code should still work as long as pulsar lib is handling the new format correctly.

I remember testing this out as well but your concern is valid. I am looking for another solution if that doesn't work will go with your PR one.

KKcorps commented 2 years ago

@snleee @aleksdikanski can you tell me how to reproduce this issue? I feel like this can also be related to offset at the startup of the client. If might be throwing error in case of empty string.

aleksdikanski commented 2 years ago

Sure @KKcorps , I basically followed the guides on the pulsar and pinot websites:

funny enough I tested this today and actually got another error that is different from the two mentioned above:

2021/12/20 16:26:21.807 ERROR [SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel [HelixTaskExecutor-message_handle_thread] Caught exception in state transition from OFFLINE -> ONLINE for resource: airlineStats_REALTIME, partition: airlineStats__0__0__20211220T1626Z
java.lang.RuntimeException: org.apache.pulsar.shaded.com.google.protobuf.v241.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field. This could mean either than the input has been truncated or that an embedded message misreported its own length.
at org.apache.pulsar.client.internal.ReflectionUtils.catchExceptions(ReflectionUtils.java:43) ~[pinot-pulsar-0.9.0-shaded.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
at org.apache.pulsar.client.internal.DefaultImplementation.newMessageIdFromByteArray(DefaultImplementation.java:103) ~[pinot-pulsar-0.9.0-shaded.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
at org.apache.pulsar.client.api.MessageId.fromByteArray(MessageId.java:58) ~[pinot-pulsar-0.9.0-shaded.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906] 
at org.apache.pinot.plugin.stream.pulsar.MessageIdStreamOffset.(MessageIdStreamOffset.java:47) ~[pinot-pulsar-0.9.0-shaded.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
at org.apache.pinot.plugin.stream.pulsar.MessageIdStreamOffsetFactory.create(MessageIdStreamOffsetFactory.java:39) ~[pinot-pulsar-0.9.0-shaded.jar:0.9.0-cf8b84e8b0d6ab62374048de586ce7da21132906]
at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.(LLRealtimeSegmentDataManager.java:1209) ~[pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:344) ~[pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:162) ~[pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:164) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline(SegmentOnlineOfflineStateModelFactory.java:86) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]

it also related to the MessageId parsing and was also fixed with my implementation

KKcorps commented 2 years ago

Can you also mentioned the Pulsar version you are testing this on? Pinot version seems to be 0.9.1 from the exception logs.

aleksdikanski commented 2 years ago

I did give the pulsar docker images, which uses the latest version 2.8.1,

aleksdikanski commented 2 years ago

I also dig a bit yesterday and saw, that the start offset for a segment is fetched from Zookeeper. So I looked at the entry (PinotCluster / PROPERTYSTORE / SEGMENTS / / ) and saw, that the offset is set to the toString() value of the MessageId as well, e.g., something like "segment.realtime.startOffset": "3432:0:1:0". IMHO it is a bit dangerous to use the toString() representation, as this is more likely to change. I guess it should just be the byte array characters of the MessageId.toByteArray() method. I'm afraid I don't know where the ZK value is written.

KKcorps commented 2 years ago

Yeah, I think the string offsets are borrowed from the Kafka implementation. We might need to make this more generic. That's also another reason I don't want to split the message id on : and create from there as it can change on pulsar side. IMO, what the solution should be here is to store in Hex string format and then parse hex string -> byte array -> message id.

KKcorps commented 2 years ago

@aleksdikanski In which file does this exception occur? Is it in pinot-all.log or pinot-quickstart.log or during creation of the table itself when running the pinot-admin command?

aleksdikanski commented 2 years ago

Hi, I used the log output of the docker containers, so actually no log files were created. the error occurred in the logs of the pinot-server container at the time the first message was read from the topic. so I guess it is fair to say, it occurred when the first segment was created. The AddTable command of the pinot-admin cli did not result in the exception.

KKcorps commented 2 years ago

@aleksdikanski https://github.com/apache/pinot/pull/7947 can you try this and see if it is working on your end. It is working on my end. There was also a corrupt \n in your airline data json. Please remove that before testing.

I am also noticing some issue with using the auto.offset.reset to largest. It works fine when using smallest Working on the fix for that as well.

aleksdikanski commented 2 years ago

@KKcorps thanks I tested your changes and I don't get any errors regarding the MessageId anymore. The messages are now correctly consumed by pinot.

There was also a corrupt \n in your airline data json. Please remove that before testing.

Must have been a copy paste error, did not have a problem with the original file.

aleksdikanski commented 2 years ago

Hi @KKcorps are there any updates on this? Can I support in any way?

KKcorps commented 2 years ago

Hi, The PR for the fix is already merged. Is there any other issue with code will be happy to help.

aleksdikanski commented 2 years ago

@KKcorps You mentioned something not working with the largest offset, does this persist?

I am also noticing some issue with using the auto.offset.reset to largest. It works fine when using smallest Working on the fix for that as well.

I'm not using the largestoffset for my use case, and have demoed the pulsar / pinot integration in my company. So I would be interested to make this part of the pulsar release again by reverting PR #7272. Maybe also a question for @snleee

KKcorps commented 2 years ago

My bad. I need to get on this issue again. I remember testing it out thoroughly and it seemed to be a problem with my test case rather than the plugin itself. I will update you in next 24 hrs. Update: Tested it out and the issue is definitely there. IMO, the source of the issue is the Pulsar Reader is always resetting to the latest when starting up. I am trying out a fix by using the Consumer interface instead of the Reader.

KKcorps commented 2 years ago

https://github.com/apache/pinot/pull/8017/files Raised a draft PR. Still testing out more scenarios with this fix.

polyzos commented 2 years ago

@KKcorps any updates on this? is there anything to help here?

KKcorps commented 2 years ago

Hi The PR was upgraded from draft status to ready for review. Still waiting to be merged

aleksdikanski commented 2 years ago

Hi @KKcorps,

could you also revert the change in https://github.com/apache/pinot/pull/7272/files in your PR? Otherwise the issue will be fixed but pulsar plugin will not be included in the next pinot release. Also there are some tests failing in your PR. Thx