apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.23k stars 3.58k forks source link

unclear relation between namespace and cluster name #11356

Open Zhen-hao opened 3 years ago

Zhen-hao commented 3 years ago

See my console output:

[root@server2:~]# pulsar-admin namespaces list pairtime-test
"pairtime-test/projection_akka_journal_nt_cluster_run_7"

[root@server2:~]# pulsar-admin namespaces get-clusters pairtime-test/projection_akka_journal_nt_cluster_run_7
"pulsar-cluster-1"

pulsar-cluster-1 was the cluster name I used in the first deployment, which was changed to pairtime-test in later deployments. Somehow the first name is sticky to namespaces created under that name.

And in the documentation page, it says "replication clusters". It seems all the settings around the cluster name make sense only in a geo-replication setup.

However, the above mismatch cause runtime errors like

org.apache.pulsar.client.api.PulsarClientException$BrokerMetadataException: Namespace missing local cluster name in clusters list: local_cluster=nt-test-cluster ns=pairtime-test/projection_akka_journal_nt_cluster_run_7 clusters=[pulsar-cluster-1]
        at org.apache.pulsar.client.impl.ClientCnx.getPulsarClientException(ClientCnx.java:1092)
        at org.apache.pulsar.client.impl.ClientCnx.handlePartitionResponse(ClientCnx.java:589)
        at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:129)

In a single cluster setup.

Posible fix:

  1. Make it very clear what the cluster name means to a namespace, and what operational steps are needed when the user changes the cluster name in a single-cluster setup.
  2. Or, handle the above error internally and make it always work in a single-cluster setup.
Technoboy- commented 3 years ago

Try to use the below cmd to check if it works:

./pulsar-admin namespaces set-clusters pairtime-test/projection_akka_journal_nt_cluster_run_7 -c "nt-test-cluster"
Zhen-hao commented 3 years ago

Try to use the below cmd to check if it works:

./pulsar-admin namespaces set-clusters pairtime-test/projection_akka_journal_nt_cluster_run_7 -c "nt-test-cluster"

This doesn't work.

09:23:00.805 [AsyncHttpClient-7-1] WARN  org.apache.pulsar.client.admin.internal.BaseResource - [http://127.0.0.1:8081/admin/v2/namespaces/pairtime-test/projection_akka_journal_nt_cluster_run_7/replication] Failed to perform http post request: javax.ws.rs.ForbiddenException: HTTP 403 Forbidden
Cluster [nt-test-cluster] is not in the list of allowed clusters list for tenant [pairtime-test]

Reason: Cluster [nt-test-cluster] is not in the list of allowed clusters list for tenant [pairtime-test]
Zhen-hao commented 3 years ago

Try to use the below cmd to check if it works:

./pulsar-admin namespaces set-clusters pairtime-test/projection_akka_journal_nt_cluster_run_7 -c "nt-test-cluster"

This doesn't work.

09:23:00.805 [AsyncHttpClient-7-1] WARN  org.apache.pulsar.client.admin.internal.BaseResource - [http://127.0.0.1:8081/admin/v2/namespaces/pairtime-test/projection_akka_journal_nt_cluster_run_7/replication] Failed to perform http post request: javax.ws.rs.ForbiddenException: HTTP 403 Forbidden
Cluster [nt-test-cluster] is not in the list of allowed clusters list for tenant [pairtime-test]

Reason: Cluster [nt-test-cluster] is not in the list of allowed clusters list for tenant [pairtime-test]

It works after running pulsar-admin tenants update pairtime-test -c nt-test-cluster.

So, the cluster name is also used when creating a tenant.

Zhen-hao commented 3 years ago

Even after updating the cluster name on the tenant and namespace level, there are still problems on the producer client.

[2021-07-19 11:27:03,632] [ERROR] [org.apache.pulsar.client.impl.ProducerImpl] [] [pulsar-client-io-1-1] - [pairtime-test/projection_akka_journal_nt_cluster_run_7/User_UserLoggedIn] [null] Failed to create producer: org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while reading ledger -  ledger=90 - operation=Failed to read entry - entry=0 caused by org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while reading ledger -  ledger=90 - operation=Failed to read entry - entry=0 {}
[2021-07-19 11:27:03,632] [WARN] [org.apache.pulsar.client.impl.ConnectionHandler] [] [pulsar-client-io-1-1] - [pairtime-test/projection_akka_journal_nt_cluster_run_7/User_UserLoggedIn] [null] Could not get connection to broker: org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while reading ledger -  ledger=90 - operation=Failed to read entry - entry=0 caused by org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while reading ledger -  ledger=90 - operation=Failed to read entry - entry=0 -- Will try again in 51.103 s {}

It is really a disaster to change the cluster name of a running cluster. The documentation should warn against that.

codelipenghui commented 2 years ago

The issue had no activity for 30 days, mark with Stale label.