Open rodolphogarrido opened 10 months ago
cc @apucher @xiangfu0 @snleee
This is also a good question to be posted in the slack troubleshooting channel
Hi @Jackie-Jiang, as suggested I have posted in the slack troubleshooting channel and the issue was reproduced by @HubertWo, as well as been reported to be affecting another user.
Would you consider to tag this as a BUG?
Thank you very much!
Hi Team,
We are facing the same issue as well.
Hi team, same issue as well. Could you share your workaround (if any)? @rodolphogarrido
Looking into this, hypnosis is that server calls to controller may not carry the credential.
@KKcorps assigning to you, please take a look.
@rodolphogarrido does this happen on the first segment commit or do you see some segments getting committed correctly and then all of a sudden it starts failing?
You can check the logs for if there's any successfull call for /segmentConsumed
Also, has the ACL configs been enabled since the first deployment or were they enabled later on? If later on, were all the pinot components restarted or only controller or server?
@KKcorps - in my case ACL config was enabled since the first deployment and it was working for a while until it stops working and if i restart the components it works for sometime till this happens again.
We dont see any pattern on why this is happening but it stops all of a sudden.
@KKcorps - in my case ACL config was enabled since the first deployment and it was working for a while until it stops working and if i restart the components it works for sometime till this happens again.
We dont see any pattern on why this is happening but it stops all of a sudden.
Thanks for the reply. Can you also check the server logs for /segmentConsumed calls and see if all have failed or some are successfull
@KKcorps - All the calls are getting failed.
@KKcorps - All the calls are getting failed.
Thanks, that means that there is auth token mismatch. Between server and controller. I tried reproducing it in local with 0.12.1 but it's working fine
can you validate the pinot.server.segment.uploader.auth.token
and see if it matches the base64 encoded value in controller
yes @KKcorps I verified the same and it was right correctly in the server config. The server is able to consume message for sometime and it stops after a point.
yes @KKcorps I verified the same and it was right correctly in the server config. The server is able to consume message for sometime and it stops after a point.
Yes that I understand. So what's happening is that the consumption is not affected. Once you consume enough data pinot needs to commit that segment. It's during that call that you are getting the error.
Will see if something is missing. also will it be possible for you to reproduce it with pinot 1.0
@KKcorps - One more thing that i notice that in our config we are using something like this pinot.server.segment.uploader.auth.token=${env:PINOT_AUTH}
and PINOT_AUTH was not getting updated inside /var/pinot/server/config/pinot-server.conf. Will this be related?
@KKcorps - One more thing that i notice that in our config we are using something like this
pinot.server.segment.uploader.auth.token=${env:PINOT_AUTH}
and PINOT_AUTH was not getting updated inside /var/pinot/server/config/pinot-server.conf. Will this be related?
Yes, that is the exact property that will cause the issue. It needs to have correct value
What should be the correct value? I have the env variable named PINOT_AUTH set from the secret in kubernetes.
Hi, @KKcorps how are you? Thanks for looking into this issue!
Answering your questions:
The first segment commit is getting commited, but subsequent ones are not commited and this is when the error happens.
Since this is an ephemeral test cluster I have tried both cases with a fresh deployment for each case (including Zookeeper), with the ACL enabled since the first deployment, as well as enabling it latter and restarting every component and faced the issue in both cases.
In my case I can confirm that every token has the correct value in base64 (hardcoded them to the config for this test).
Hi, @KKcorps how are you? Thanks for looking into this issue!
Answering your questions:
The first segment commit is getting commited, but subsequent ones are not commited and this is when the error happens.
Since this is an ephemeral test cluster I have tried both cases with a fresh deployment for each case (including Zookeeper), with the ACL enabled since the first deployment, as well as enabling it latter and restarting every component and faced the issue in both cases.
In my case I can confirm that every token has the correct value in base64 (hardcoded them to the config for this test).
Thanks a lot for clarifying! Yeah if the first commit is happening and subsequent ones are not then it is quite weird.
One more question were you facing this issue in older pinot version as well? also can you try the latest pinot 1.0 release and see if it occurs
What should be the correct value? I have the env variable named PINOT_AUTH set from the secret in kubernetes.
I meant the ${env:PINOT_AUTH}
should resolve to the base64 auth token e.g. Basic YWRtaW46dmVyeXNlY3JldA==
@KKcorps after double checking, it seems that the first segment isn't actually beeing commited (sorry for the mistake). Pinot is able to consume some record (since I can query a few ingested records), but when it tries to commit the first segment it fails and than consumption also stops.
I started testing pinot with the version 0.12.1 (didn't try older version). I've tried the new version (1.0.0) and the same issue happens.
Trying with version 1.0.0, the server logs show:
2023-10-13 07:17:26 2023/10/13 10:17:26.493 ERROR [LLRealtimeSegmentDataManager_events_upsert_full__1__0__20231013T1014Z] [events_upsert_full__1__0__20231013T1014Z] Holding after response from Controller: {"status":"NOT_SENT","streamPartitionMsgOffset":null,"isSplitCommitType":false,"buildTimeSec":-1,"offset":-1}
2023-10-13 07:17:29 2023/10/13 10:17:29.438 ERROR [ServerSegmentCompletionProtocolHandler] [events_upsert_partial__1__0__20231013T1014Z] Could not send request http://pinot-controller-0:9000/segmentConsumed?reason=rowLimit&streamPartitionMsgOffset=2&instance=Server_pinot-server-0_8098&offset=-1&name=events_upsert_partial__1__0__20231013T1014Z&rowCount=2&memoryUsedBytes=1140
2023-10-13 07:17:29 org.apache.pinot.common.exception.HttpErrorStatusException: Got error status code: 403 (Forbidden) with reason: "Permission is denied for READ '/segmentConsumed'" while sending request: http://pinot-controller-0:9000/segmentConsumed?reason=rowLimit&streamPartitionMsgOffset=2&instance=Server_pinot-server-0_8098&offset=-1&name=events_upsert_partial__1__0__20231013T1014Z&rowCount=2&memoryUsedBytes=1140 to controller: pinot-controller-0, version: Unknown
2023-10-13 07:17:29 at org.apache.pinot.common.utils.http.HttpClient.wrapAndThrowHttpException(HttpClient.java:448) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
2023-10-13 07:17:29 at org.apache.pinot.common.utils.FileUploadDownloadClient.sendSegmentCompletionProtocolRequest(FileUploadDownloadClient.java:1129) ~[pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
2023-10-13 07:17:29 at org.apache.pinot.server.realtime.ServerSegmentCompletionProtocolHandler.sendRequest(ServerSegmentCompletionProtocolHandler.java:221) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
2023-10-13 07:17:29 at org.apache.pinot.server.realtime.ServerSegmentCompletionProtocolHandler.segmentConsumed(ServerSegmentCompletionProtocolHandler.java:188) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
2023-10-13 07:17:29 at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.postSegmentConsumedMsg(LLRealtimeSegmentDataManager.java:1152) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
2023-10-13 07:17:29 at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:700) [pinot-all-1.0.0-jar-with-dependencies.jar:1.0.0-b6bdf6c9686b286a149d2d1aea4a385ee98f3e79]
2023-10-13 07:17:29 at java.lang.Thread.run(Thread.java:829) [?:?]
2023-10-13 07:17:29 2023/10/13 10:17:29.440 ERROR [LLRealtimeSegmentDataManager_events_upsert_partial__1__0__20231013T1014Z] [events_upsert_partial__1__0__20231013T1014Z] Holding after response from Controller: {"status":"NOT_SENT","streamPartitionMsgOffset":null,"isSplitCommitType":false,"buildTimeSec":-1,"offset":-1}
If I try the same request from the error, using the admin secret:
curl -i -X GET -H 'Content-Type: application/json' -u "admin:verysecret" localhost:9000/segmentConsumed?reason=rowLimit&streamPartitionMsgOffset=2&instance=Server_pinot-server-0_8098&offset=-1&name=events_upsert_partial__1__0__20231013T1014Z&rowCount=2&memoryUsedBytes=1140
The output is:
{"status":"COMMIT","isSplitCommitType":true,"buildTimeSec":126,"streamPartitionMsgOffset":"2","controllerVipUrl":"http://pinot-controller-0:9000","offset":2}
This issue still persists on 1.0.0, Pinot servers cannot authenticate themselves to Pinot controller. This is odd because documentation says Pinot supports basic auth. This issue causes Pinot not to be able to function with a level of security.
Is there any known workaround so far?
Hi, Facing the same issue in 1.0.0 version. Is there any ETA on when this be fixed or any workaround? This is a blocker as new segments are not getting created after threshold is met.
@zhtaoxiang Can you please take a look?
any news on this ?
@EnzoDechaene
Check your auth configurations. Add below configs to server if not added yet.
pinot.server.segment.uploader.auth.token="Basic XXXXXXX" pinot.server.instance.auth.token="Basic XXXXXXX"
While using the ACL feature, Pinot Servers aren't able to consume messages from Kafka due to a permission denied error in the endpoint
/segmentConsumed
(more details in the log).Ps: The Servers are able to consume a few events before the error starts, but after that no more events are consumed.
Cluster version
Apache Pinot version: 0.12.1
Cluster ACL configs:
Controller ACL conf:
Server ACL conf:
Broker ACL conf:
Minion ACL conf:
Table config
Schema config
Log error
Server Appconfig
Controller appconf
Thank you very much!