Closed cjac closed 1 year ago
/gcbrun
/gcbrun
passing: 1.5-rocky8 2.0-rocky8 2.1-rocky8
failing: 1.5-debian10
/gcbrun
passing: 1.5-rocky8 2.0-rocky8 2.1-rocky8 2.1-ubuntu20 2.1-debian11 1.5-ubuntu18 2.0-ubuntu18 1.5-debian10
failing: 2.0-debian10
2023-07-11T18:37:35.361698945Z + echo -e '\nStarting validation on test-oozie-ha-2-0-20230711-182856-8iin-m-2:'
2023-07-11T18:37:35.361706478Z + oozie admin -sharelibupdate
2023-07-11T18:37:35.361731919Z Connection exception has occurred [ java.net.ConnectException Error while authenticating with endpoint: http://test-oozie-ha-2-0-20230711-182856-8iin-m-2.c.cloud-dataproc-ci.internal:11000/oozie/versions ]. Trying after 1 sec. Retry count = 1
2023-07-11T18:37:35.361744021Z Connection exception has occurred [ java.net.ConnectException Error while authenticating with endpoint: http://test-oozie-ha-2-0-20230711-182856-8iin-m-2.c.cloud-dataproc-ci.internal:11000/oozie/versions ]. Trying after 2 sec. Retry count = 2
2023-07-11T18:37:35.369534342Z Connection exception has occurred [ java.net.ConnectException Error while authenticating with endpoint: http://test-oozie-ha-2-0-20230711-182856-8iin-m-2.c.cloud-dataproc-ci.internal:11000/oozie/versions ]. Trying after 4 sec. Retry count = 3
2023-07-11T18:37:35.369607103Z Connection exception has occurred [ java.net.ConnectException Error while authenticating with endpoint: http://test-oozie-ha-2-0-20230711-182856-8iin-m-2.c.cloud-dataproc-ci.internal:11000/oozie/versions ]. Trying after 8 sec. Retry count = 4
2023-07-11T18:37:35.369617997Z java.net.ConnectException: Error while authenticating with endpoint: http://test-oozie-ha-2-0-20230711-182856-8iin-m-2.c.cloud-dataproc-ci.internal:11000/oozie/versions
2023-07-11T18:37:35.369626864Z at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
2023-07-11T18:37:35.369651197Z at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
2023-07-11T18:37:35.369659917Z at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
2023-07-11T18:37:35.369666923Z at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
2023-07-11T18:37:35.369674404Z at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.wrapExceptionWithMessage(KerberosAuthenticator.java:232)
2023-07-11T18:37:35.369681341Z at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:216)
2023-07-11T18:37:35.369688102Z at org.apache.oozie.client.AuthOozieClient.createConnection(AuthOozieClient.java:197)
2023-07-11T18:37:35.369696248Z at org.apache.oozie.client.OozieClient$1.doExecute(OozieClient.java:515)
2023-07-11T18:37:35.369704258Z at org.apache.oozie.client.retry.ConnectionRetriableClient.execute(ConnectionRetriableClient.java:44)
2023-07-11T18:37:35.369712300Z at org.apache.oozie.client.OozieClient.createRetryableConnection(OozieClient.java:517)
2023-07-11T18:37:35.369751478Z at org.apache.oozie.client.OozieClient.getSupportedProtocolVersions(OozieClient.java:397)
2023-07-11T18:37:35.369760504Z at org.apache.oozie.client.OozieClient.validateWSVersion(OozieClient.java:357)
2023-07-11T18:37:35.369768323Z at org.apache.oozie.client.OozieClient.createURL(OozieClient.java:468)
2023-07-11T18:37:35.369775043Z at org.apache.oozie.client.OozieClient.access$000(OozieClient.java:88)
2023-07-11T18:37:35.369782420Z at org.apache.oozie.client.OozieClient$ClientCallable.call(OozieClient.java:562)
2023-07-11T18:37:35.369790100Z at org.apache.oozie.client.OozieClient.updateShareLib(OozieClient.java:2162)
2023-07-11T18:37:35.369797391Z at org.apache.oozie.cli.OozieCLI.adminCommand(OozieCLI.java:2032)
2023-07-11T18:37:35.369820361Z at org.apache.oozie.cli.OozieCLI.processCommand(OozieCLI.java:733)
2023-07-11T18:37:35.369828963Z at org.apache.oozie.cli.OozieCLI.run(OozieCLI.java:682)
2023-07-11T18:37:35.369836361Z at org.apache.oozie.cli.OozieCLI.main(OozieCLI.java:245)
2023-07-11T18:37:35.369844056Z Caused by: java.net.ConnectException: Connection refused (Connection refused)
2023-07-11T18:37:35.369860519Z at java.net.PlainSocketImpl.socketConnect(Native Method)
2023-07-11T18:37:35.369869668Z at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
2023-07-11T18:37:35.369877625Z at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
2023-07-11T18:37:35.369885310Z at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
2023-07-11T18:37:35.369919624Z at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
2023-07-11T18:37:35.369928124Z at java.net.Socket.connect(Socket.java:607)
2023-07-11T18:37:35.369936240Z at java.net.Socket.connect(Socket.java:556)
2023-07-11T18:37:35.369944737Z at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
2023-07-11T18:37:35.369952861Z at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
2023-07-11T18:37:35.369960558Z at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
2023-07-11T18:37:35.369968193Z at sun.net.www.http.HttpClient.
I can reproduce this
Starting validation on cluster-1676904778-m-2:
+ oozie admin -sharelibupdate
Connection exception has occurred [ java.net.ConnectException Error while authenticating with endpoint: http://cluster-1676904778-m-2.c.cjac-2021-00.internal:11000/oozie/versions ].
Trying after 1 sec. Retry count = 1
Connection exception has occurred [ java.net.ConnectException Error while authenticating with endpoint: http://cluster-1676904778-m-2.c.cjac-2021-00.internal:11000/oozie/versions ].
Trying after 2 sec. Retry count = 2
Connection exception has occurred [ java.net.ConnectException Error while authenticating with endpoint: http://cluster-1676904778-m-2.c.cjac-2021-00.internal:11000/oozie/versions ]. Trying after 4 sec. Retry count = 3
Connection exception has occurred [ java.net.ConnectException Error while authenticating with endpoint: http://cluster-1676904778-m-2.c.cjac-2021-00.internal:11000/oozie/versions ]. Trying after 8 sec. Retry count = 4
java.net.ConnectException: Error while authenticating with endpoint: http://cluster-1676904778-m-2.c.cjac-2021-00.internal:11000/oozie/versions
I think the endpoint host needs to be the cluster name not the master node name.
in http://cluster-1676904778-m-2.c.cjac-2021-00.internal:11000/oozie/versions
, the value cluster-1676904778-m-2
should instead be cluster-1676904778
I believe.
/gcbrun
well, that unexpectedly worked.
/gcbrun
/gcbrun
That one worked for all but 2.1-ubuntu20.
/gcbrun
/gcbrun
/gcbrun
I enabled rocky8 tests in the last run. 2.0 passed, but 2.1 is missing unzip and 1.5 didn't finish before the failure.
/gcbrun
/gcbrun
/gcbrun
/gcbrun
Okay. Kuldeep, if you can give me an LGTM, we can get this merged!
I didn't include rocky support for this PR. I think we can get it working in another one soon.
Tested this manually on dataproc 2.1-debian11 and zookeeper is failing to restart properly. Due to which init action is failing followed by cluster creation failures.
Line 577 needs to be replaced as below, followed by more testing on different variants for HA setup.
systemctl restart zookeeper-server
One more issue I noticed during my testing is oozie.services.ext
isn't having all the required configs for HA as per https://oozie.apache.org/docs/5.2.1/AG_Install.html#Pre-requisites. Can we please double check this?
Okay. Thanks for the review. Do you want me to grant you permission to update the PR while I'm afk? I can get online for 20m tomorrow and do that if it's urgent.
On Thu, Jul 13, 2023, 13:40 kuldeepkk-dev @.***> wrote:
Tested this manually on dataproc 2.1-debian11 and zookeeper is failing to restart properly. Due to which init action is failing followed by cluster creation failures.
Line 577 needs to be replaced as below, followed by more testing on different variants for HA setup.
systemctl restart zookeeper-server
— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudDataproc/initialization-actions/pull/1068#issuecomment-1634880241, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAM6UQ4WBKCPUXBY7AUW4LXQBMKLANCNFSM6AAAAAAZ7KNYGU . You are receiving this because you authored the thread.Message ID: @.*** com>
Sure CJ. If you can grant me the permissions, I can make the necessary changes and continue testing.
/gcbrun
/gcbrun
/gcbrun