TOSIT-IO / tdp-getting-started

Vagrant / Ansible environment to deploy a local TDP cluster
Apache License 2.0
19 stars 24 forks source link

Zookeeper Kafka fail to start and error when adding Ranger policies #206

Closed EmmanuelVinet33 closed 1 year ago

EmmanuelVinet33 commented 1 year ago

TDP deploy is not working. An error occurs when installing Zookeeper-kafka and the installation can not be finished. Trying to install users generate other errors.

PLAY [Zookeeper Server for Kafka start] *****

TASK [Gathering Facts] ** ok: [master-01] ok: [master-02] ok: [master-03]

TASK [tosit.tdp.resolve] **** ok: [master-02] ok: [master-01] ok: [master-03]

TASK [tosit.tdp.server : Start Zookeeper] *** changed: [master-01] fatal: [master-02]: FAILED! => changed=false ansible_facts: discovered_interpreter_python: /usr/bin/python msg: |- Unable to start service zookeeper-kafka: Job for zookeeper-kafka.service failed because the control process exited with error code. See "systemctl status zookeeper-kafka.service" and "journalctl -xe" for details. changed: [master-03]

PLAY RECAP ** master-01 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
master-02 : ok=2 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
master-03 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

2022-11-09 13:06:29,685 - ERROR - tdp.operation_runner._run_operations - Operation zookeeper-kafka_server_start failed !

Trying to continue as these are optional with hdfs user homes and ranger policies installation generates :

[WARNING]: running playbook inside collection tosit.tdp

PLAY [Ranger Admin configure policies] **

TASK [Gathering Facts] ** ok: [edge-01]

TASK [tosit.tdp.resolve] **** ok: [edge-01]

TASK [tosit.tdp.ranger_policies : Configure Ranger Admin policies] ** failed: [edge-01] (item={'name': 'tdp_user - database', 'service': 'hive-tdp', 'state': 'present'}) => changed=false ansible_loop_var: item item: description: tdp_user access to tdp_user database isAuditEnabled: true isEnabled: true name: tdp_user - database policyItems:

PLAY RECAP ** edge-01 : ok=2 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0

rpignolet commented 1 year ago

Hi the error in your log is

Unable to start service zookeeper-kafka: Job for zookeeper-kafka.service failed because the control process exited with error code. See "systemctl status zookeeper-kafka.service" and "journalctl -xe" for details.

This is not related to tdp deploy, Zookeeper Kafka failed to start, please investigate why the service does not start.

The second error is when adding policies to Ranger Admin, you should check if Ranger Admin is working, if all plugins service are created, if tdp_user is correctly sync in Ranger Admin users, etc.

leopaul36 commented 1 year ago

Depending on when you ran ./scripts/setup.sh -e extras -e prerequisites -e vagrant, you might be missing this fix: https://github.com/TOSIT-IO/tdp-collection-extras/commit/41a341ae251204c9d37517d0787071c5ba75293e which caused zookeeper-kafka to fail at start because the port is already used.

If the logs at /var/log/zookeeper-kafka/ mention some port issue, you should git pull and run setup.sh again with the -c to "clean" your inventory/tdp_vars dir.

Then you can try redeploying Zookeeper-kafka and Kafka with tdp deploy --targets kafka_init

rpignolet commented 1 year ago

@leopaul36 if this is a problem with the stable release of getting started, can you update submodules ?

leopaul36 commented 1 year ago

@leopaul36 if this is a problem with the stable release of getting started, can you update submodules ?

They were updated recently https://github.com/TOSIT-IO/tdp-getting-started/commit/2d00b776cb1b9b782682bb767f38ca7ed3fb8c5f and tested with a fresh install as always.

I said "Depending on when you ran" but "Depending on when you git clone tdp-getting-started" would have been a better phrasing.

EmmanuelVinet33 commented 1 year ago

Thanks for these informations. I made a setup.sh this morning but I'm not sure it update something. I'll retry on an empty configuration to be sure downloading all last versions.

rpignolet commented 1 year ago

Use setup.sh with -c option to have a clean working directory and correct submodule versions.

EmmanuelVinet33 commented 1 year ago

I'm now facing this error with Ranger installation : connection refused ERROR PolicyRefresher - PolicyRefresher(serviceName=kms-tdp): failed to refresh policies. Will continue to use last known version of policies (2) com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused (Connection refused) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155) at com.sun.jersey.api.client.Client.handle(Client.java:652) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509) at org.apache.ranger.admin.client.RangerAdminRESTClient$3.run(RangerAdminRESTClient.java:134) at org.apache.ranger.admin.client.RangerAdminRESTClient$3.run(RangerAdminRESTClient.java:126) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.ranger.admin.client.RangerAdminRESTClient.getServicePoliciesIfUpdated(RangerAdminRESTClient.java:137) at org.apache.ranger.plugin.util.PolicyRefresher.loadPolicyfromPolicyAdmin(PolicyRefresher.java:251) at org.apache.ranger.plugin.util.PolicyRefresher.loadPolicy(PolicyRefresher.java:191) at org.apache.ranger.plugin.util.PolicyRefresher.run(PolicyRefresher.java:161) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:607) at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:293) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:463) at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) at sun.net.www.protocol.https.HttpsClient.(HttpsClient.java:264) at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:203) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:189) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1572) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1500) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:352) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153) ... 13 more

leopaul36 commented 1 year ago

I just ran a fresh install without any issues.

Can you find any relevant logs at /var/log/ranger-kms/ that might help us understand what's wrong with Ranger KMS?

rpignolet commented 1 year ago

Close as inactive.