Closed lhotari closed 3 hours ago
@merlimat @mattisonchao Do you have a chance to help in addressing this issue with the Oxia support in Apache Pulsar Helm chart? The chart currently works with 1 oxia server replica, but when that's 3, it fails.
@merlimat @mattisonchao Do you have a chance to help in addressing this issue with the Oxia support in Apache Pulsar Helm chart? The chart currently works with 1 oxia server replica, but when that's 3, it fails.
I found the issue and made a PR in #553. Now Oxia cluster starts up without problems, but the Oxia Java client fails to connect.
There are errors like this:
2024-11-22T13:41:28,857+0000 [grpc-default-worker-ELG-2-4] WARN io.streamnative.oxia.client.notify.ShardNotificationReceiver - Error while receiving notifications for shard=8: UNKNOWN: node is not leader for shard 8 - Retrying in 0.11 seconds
@mattisonchao @merlimat What's the reason for this problem?
There are errors like this:
2024-11-22T13:41:28,857+0000 [grpc-default-worker-ELG-2-4] WARN io.streamnative.oxia.client.notify.ShardNotificationReceiver - Error while receiving notifications for shard=8: UNKNOWN: node is not leader for shard 8 - Retrying in 0.11 seconds
@mattisonchao @merlimat What's the reason for this problem?
@mattisonchao @merlimat The problem got resolved with the latest changes in PR #553, however, I don't know what the real reason was. When would such an error occur?
Describe the bug
PR #544 adds support for using Oxia as the metadata store for Pulsar and BookKeeper. When specifying an Oxia cluster with 3 pods, it fails to become available.
To Reproduce
Expected behavior
Oxia cluster should become available.
Additional context
Error message in other oxia server pods:
{"level":"warn","time":"2024-11-22T10:49:19.617371211Z","component":"public-rpc-server","error":{"error":"rpc error: code = Code(100) desc = oxia: server not initialized yet","kind":"*status.Error","stack":null},"peer":"10.1.5.106:47738","time":"2024-11-22T10:49:19.617479253Z","message":"Failed to add client for shards assignments notifications"}
In the oxia coordinator, everything looks fine:
pulsar-oxia-coordinator-status doesn't look correct since the invidual pod addresses aren't included. I would assume that it contains pod addresses instead of referencing the service DNS name:
coordinator configmap