Intel-bigdata / SSM

Smart Storage Management for Big Data, a comprehensive hot/cold data optimized solution
Apache License 2.0
133 stars 67 forks source link

When CDH switches NA IP address from standy to active -> SSM should also switch to active NA #1813

Open aurorahunter opened 6 years ago

aurorahunter commented 6 years ago

SSM should auto detect NameNode IP addresses and should not expect manual intervention Error Signature:

2018-06-08 13:10:42,358 ERROR org.smartdata.hdfs.metric.fetcher.InotifyFetchAndApplyTask.run 63: Inotify Apply Events error org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1835) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1513) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1751) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1036) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1543) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275)

    at org.apache.hadoop.ipc.Client.call(Client.java:1472)
    at org.apache.hadoop.ipc.Client.call(Client.java:1409)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
    at com.sun.proxy.$Proxy15.getEditsFromTxid(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolTranslatorPB.java:1506)
    at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
    at com.sun.proxy.$Proxy16.getEditsFromTxid(Unknown Source)
    at org.apache.hadoop.hdfs.DFSInotifyEventInputStream.poll(DFSInotifyEventInputStream.java:109)
    at org.smartdata.hdfs.metric.fetcher.InotifyFetchAndApplyTask.run(InotifyFetchAndApplyTask.java:53)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

My configuration 👍

smart.dfs.namenode.rpcserver hdfs://sjsstaging001.sj.adas.intel.com/134.191.230.46:9000 Namenode rpcserver smart.hadoop.conf.path file:///etc/hadoop/conf Hadoop main cluster configuration file path smart.tidb.enable false This decides whether Tidb is enabled. smart.security.enable true smart.server.keytab.file /etc/security/ssm_deploy/ssm_deploy.keytab smart.server.kerberos.principal ssm_deploy@ADAS.INTEL.COM To ReproducE: Our IGK-DevOps cluster is shared with you. You can use it reproduce. Start the SSM server. Then restart HDFS service or manually change Standby namenode to active one. SSM will error with above signature Thanks
aurorahunter commented 6 years ago

Hi Qiyuan,

I tried your suggestion - Keeping just hadoop conf directory and not to use namenode property in xml. But SSM still error outs.

hdfs_ssm_ha_debug.zip

I have attached the following in the zipped folder: 1) SSM logs 2) SSM conf directory 3) Hadoop conf directory in the path : /etc/hadoop/conf dir

Let me know if you need any other information. Appreciate your time and help.