datavane / datasophon

The next generation of cloud-native big data management expert , Aims to help users rapidly build stable, efficient, and scalable cloud-native platforms for big data.
https://datasophon.github.io/datasophon-website/
Apache License 2.0
1.14k stars 393 forks source link

HDFS安装服务报错 #559

Open misteruly opened 6 months ago

misteruly commented 6 months ago

Search before asking

What happened

ddp1安装DataNode报错:

can not find log file

ddp2安装DataNode报错:

can not find log file

ddp2安装 NameNode报错:

2024-05-20 14:26:15,345 INFO ipc.Client: Retrying connect to server: ddp1/192.168.4.180:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2024-05-20 14:26:15,349 WARN namenode.NameNode: Encountered exception during format
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 2 successful responses:
192.168.4.182:8485: false
192.168.4.181:8485: false
1 exceptions thrown:
192.168.4.180:8485: Call From ddp2/192.168.4.181 to ddp1:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
    at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:305)
    at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:282)
    at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:1185)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:212)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1274)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1726)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1834)
2024-05-20 14:26:15,392 INFO namenode.FSNamesystem: Stopping services started for active state
2024-05-20 14:26:15,392 INFO namenode.FSNamesystem: Stopping services started for standby state
2024-05-20 14:26:15,392 ERROR namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 2 successful responses:
192.168.4.182:8485: false
192.168.4.181:8485: false
1 exceptions thrown:
192.168.4.180:8485: Call From ddp2/192.168.4.181 to ddp1:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
    at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:305)
    at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:282)
    at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:1185)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:212)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1274)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1726)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1834)
2024-05-20 14:26:15,394 INFO util.ExitUtil: Exiting with status 1: org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 2 successful responses:
192.168.4.182:8485: false
192.168.4.181:8485: false
1 exceptions thrown:
192.168.4.180:8485: Call From ddp2/192.168.4.181 to ddp1:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
2024-05-20 14:26:15,395 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ddp2/192.168.4.181
************************************************************/
Last login: Mon May 20 14:16:00 CST 2024 on pts/0

[ERROR] 2024-05-20 14:26:15 TaskLogLogger-HDFS-NameNode:[197] - 
[INFO] 2024-05-20 15:36:16 TaskLogLogger-HDFS-NameNode:[86] - Remote package md5 is a307e097d66da00636e44cd32148a13a
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[91] - Local md5 is a307e097d66da00636e44cd32148a13a
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[82] - Start to configure service role NameNode
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[181] - configure success
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[263] - size is :1
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[266] - config set value to /data/dfs/nn
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[263] - size is :1
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[266] - config set value to /data/dfs/dn
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[181] - configure success
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[263] - size is :4
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[266] - config set value to RULE:[2:$1/$2@$0]([ndj]n\/.*@HADOOP\.COM)s/.*/hdfs/
RULE:[2:$1/$2@$0]([rn]m\/.*@HADOOP\.COM)s/.*/yarn/
RULE:[2:$1/$2@$0](jhs\/.*@HADOOP\.COM)s/.*/mapred/
DEFAULT
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[181] - configure success
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[181] - configure success
[INFO] 2024-05-20 15:36:18 TaskLogLogger-HDFS-NameNode:[58] - Start to execute format namenode
[ERROR] 2024-05-20 15:39:18 TaskLogLogger-HDFS-NameNode:[70] - Namenode format failed
[INFO] 2024-05-20 15:44:21 TaskLogLogger-HDFS-NameNode:[182] - 2024-05-20 15:36:20,104 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ddp2/192.168.4.181
STARTUP_MSG:   args = [-format, smhadoop]
STARTUP_MSG:   version = 3.3.3

2024-05-20 15:36:20,725 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2024-05-20 15:36:20,737 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
2024-05-20 15:36:20,738 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
2024-05-20 15:36:20,741 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
2024-05-20 15:36:20,741 INFO blockmanagement.BlockManager: The block deletion will start around 2024 May 20 15:36:20
2024-05-20 15:36:20,742 INFO util.GSet: Computing capacity for map BlocksMap
2024-05-20 15:36:20,742 INFO util.GSet: VM type       = 64-bit
2024-05-20 15:36:20,743 INFO util.GSet: 2.0% max memory 7.7 GB = 157.0 MB
2024-05-20 15:36:20,743 INFO util.GSet: capacity      = 2^24 = 16777216 entries
2024-05-20 15:36:20,763 INFO blockmanagement.BlockManager: Storage policy satisfier is disabled
2024-05-20 15:36:20,763 INFO blockmanagement.BlockManager: dfs.block.access.token.enable = false
2024-05-20 15:36:20,769 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.999
2024-05-20 15:36:20,769 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
2024-05-20 15:36:20,769 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
2024-05-20 15:36:20,770 INFO blockmanagement.BlockManager: defaultReplication         = 3
2024-05-20 15:36:20,770 INFO blockmanagement.BlockManager: maxReplication             = 512
2024-05-20 15:36:20,770 INFO blockmanagement.BlockManager: minReplication             = 1
2024-05-20 15:36:20,770 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
2024-05-20 15:36:20,770 INFO blockmanagement.BlockManager: redundancyRecheckInterval  = 3000ms
2024-05-20 15:36:20,770 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
2024-05-20 15:36:20,770 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
2024-05-20 15:36:20,793 INFO namenode.FSDirectory: GLOBAL serial map: bits=29 maxEntries=536870911
2024-05-20 15:36:20,793 INFO namenode.FSDirectory: USER serial map: bits=24 maxEntries=16777215
2024-05-20 15:36:20,793 INFO namenode.FSDirectory: GROUP serial map: bits=24 maxEntries=16777215
2024-05-20 15:36:20,793 INFO namenode.FSDirectory: XATTR serial map: bits=24 maxEntries=16777215
2024-05-20 15:36:20,805 INFO util.GSet: Computing capacity for map INodeMap
2024-05-20 15:36:20,805 INFO util.GSet: VM type       = 64-bit
2024-05-20 15:36:20,805 INFO util.GSet: 1.0% max memory 7.7 GB = 78.5 MB
2024-05-20 15:36:20,805 INFO util.GSet: capacity      = 2^23 = 8388608 entries
2024-05-20 15:36:20,811 INFO namenode.FSDirectory: ACLs enabled? true
2024-05-20 15:36:20,811 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true
2024-05-20 15:36:20,811 INFO namenode.FSDirectory: XAttrs enabled? true
2024-05-20 15:36:20,812 INFO namenode.NameNode: Caching file names occurring more than 10 times
2024-05-20 15:36:20,817 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRootDescendant: true, maxSnapshotLimit: 65536
2024-05-20 15:36:20,819 INFO snapshot.SnapshotManager: SkipList is disabled
2024-05-20 15:36:20,823 INFO util.GSet: Computing capacity for map cachedBlocks
2024-05-20 15:36:20,823 INFO util.GSet: VM type       = 64-bit
2024-05-20 15:36:20,823 INFO util.GSet: 0.25% max memory 7.7 GB = 19.6 MB
2024-05-20 15:36:20,823 INFO util.GSet: capacity      = 2^21 = 2097152 entries
2024-05-20 15:36:20,832 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2024-05-20 15:36:20,832 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2024-05-20 15:36:20,832 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2024-05-20 15:36:20,838 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2024-05-20 15:36:20,838 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2024-05-20 15:36:20,840 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2024-05-20 15:36:20,840 INFO util.GSet: VM type       = 64-bit
2024-05-20 15:36:20,840 INFO util.GSet: 0.029999999329447746% max memory 7.7 GB = 2.4 MB
2024-05-20 15:36:20,840 INFO util.GSet: capacity      = 2^18 = 262144 entries
Re-format filesystem in Storage Directory root= /data/dfs/nn; location= null ? (Y or N) Killed
Last login: Mon May 20 15:30:01 CST 2024 on pts/0

[ERROR] 2024-05-20 15:44:21 TaskLogLogger-HDFS-NameNode:[197] - 
[INFO] 2024-05-20 15:46:13 TaskLogLogger-HDFS-NameNode:[86] - Remote package md5 is a307e097d66da00636e44cd32148a13a
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[91] - Local md5 is a307e097d66da00636e44cd32148a13a
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[82] - Start to configure service role NameNode
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[181] - configure success
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[263] - size is :1
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[266] - config set value to /data/dfs/nn
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[263] - size is :1
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[266] - config set value to /data/dfs/dn
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[181] - configure success
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[263] - size is :4
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[266] - config set value to RULE:[2:$1/$2@$0]([ndj]n\/.*@HADOOP\.COM)s/.*/hdfs/
RULE:[2:$1/$2@$0]([rn]m\/.*@HADOOP\.COM)s/.*/yarn/
RULE:[2:$1/$2@$0](jhs\/.*@HADOOP\.COM)s/.*/mapred/
DEFAULT
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[181] - configure success
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[181] - configure success
[INFO] 2024-05-20 15:46:16 TaskLogLogger-HDFS-NameNode:[58] - Start to execute format namenode
[ERROR] 2024-05-20 15:49:16 TaskLogLogger-HDFS-NameNode:[70] - Namenode format failed

ddp2 安装 ZKFC报错:

[INFO] 2024-05-20 13:37:23 TaskLogLogger-HDFS-NameNode:[86] - Remote package md5 is a307e097d66da00636e44cd32148a13a
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[91] - Local md5 is a307e097d66da00636e44cd32148a13a
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[82] - Start to configure service role NameNode
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[181] - configure success
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[263] - size is :1
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[266] - config set value to /data/dfs/nn
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[263] - size is :1
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[266] - config set value to /data/dfs/dn
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[181] - configure success
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[263] - size is :4
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[266] - config set value to RULE:[2:$1/$2@$0]([ndj]n\/.*@HADOOP\.COM)s/.*/hdfs/
RULE:[2:$1/$2@$0]([rn]m\/.*@HADOOP\.COM)s/.*/yarn/
RULE:[2:$1/$2@$0](jhs\/.*@HADOOP\.COM)s/.*/mapred/
DEFAULT
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[181] - configure success
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[181] - configure success
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[45] - Start to execute hdfs namenode -bootstrapStandby
[ERROR] 2024-05-20 13:37:39 TaskLogLogger-HDFS-NameNode:[54] - Namenode standby failed
[INFO] 2024-05-20 13:37:39 TaskLogLogger-HDFS-NameNode:[182] - 2024-05-20 13:37:27,815 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ddp3/192.168.4.182
STARTUP_MSG:   args = [-bootstrapStandby]
STARTUP_MSG:   version = 3.3.3
STARTUP_MSG:   classpath = /opt/datasophon/hadoop-3.3.3/etc/hadoop:/opt/datasophon/hadoop-3.3.3/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar:/opt/datasophon/hadoop-3.3.3/share/hadoop/common/lib/snappy-java-1.1.8.2.jar:/opt/datasophon/hadoop-3.3.3/share/hadoop/common/lib/jsr305-3.0.2.jar:/opt/datasophon/hadoop-3.3.3/share/hadoop/yarn/sources:/opt/datasophon/hadoop-3.3.3/share/hadoop/yarn/test:/opt/datasophon/hadoop-3.3.3/share/hadoop/yarn/timelineservice:/opt/datasophon/hadoop-3.3.3/share/hadoop/yarn/webapps:/opt/datasophon/hadoop-3.3.3/share/hadoop/yarn/yarn-service-examples:/opt/datasophon/hadoop-3.3.3/jmx/jmx_prometheus_javaagent-0.16.1.jar
STARTUP_MSG:   build = https://github.com/apache/hadoop.git -r d37586cbda38c338d9fe481addda5a05fb516f71; compiled by 'stevel' on 2022-05-09T16:36Z
STARTUP_MSG:   java = 1.8.0_333
************************************************************/
2024-05-20 13:37:27,821 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2024-05-20 13:37:27,906 INFO namenode.NameNode: createNameNode [-bootstrapStandby]
2024-05-20 13:37:28,084 INFO ha.BootstrapStandby: Found nn: nn1, ipc: ddp2/192.168.4.181:8020
2024-05-20 13:37:28,483 INFO common.Util: Assuming 'file' scheme for path /data/dfs/nn in configuration.
2024-05-20 13:37:28,496 INFO common.Util: Assuming 'file' scheme for path /data/dfs/nn in configuration.
2024-05-20 13:37:29,665 INFO ipc.Client: Retrying connect to server: ddp2/192.168.4.181:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2024-05-20 13:37:38,674 INFO ipc.Client: Retrying connect to server: ddp2/192.168.4.181:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2024-05-20 13:37:38,680 WARN ha.BootstrapStandby: Unable to fetch namespace information from remote NN at ddp2/192.168.4.181:8020: Call From ddp3/192.168.4.182 to ddp2:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
2024-05-20 13:37:38,680 ERROR ha.BootstrapStandby: Unable to fetch namespace information from any remote NN. Possible NameNodes: [RemoteNameNodeInfo [nnId=nn1, ipcAddress=ddp2/192.168.4.181:8020, httpAddress=http://ddp2:9870]]
2024-05-20 13:37:38,682 INFO util.ExitUtil: Exiting with status 2: ExitException
2024-05-20 13:37:38,683 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ddp3/192.168.4.182
************************************************************/

[ERROR] 2024-05-20 13:37:39 TaskLogLogger-HDFS-NameNode:[197] - 

ddp3安装安装 NameNode报错:

[INFO] 2024-05-20 13:37:23 TaskLogLogger-HDFS-NameNode:[86] - Remote package md5 is a307e097d66da00636e44cd32148a13a
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[91] - Local md5 is a307e097d66da00636e44cd32148a13a
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[82] - Start to configure service role NameNode
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[181] - configure success
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[263] - size is :1
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[266] - config set value to /data/dfs/nn
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[263] - size is :1
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[266] - config set value to /data/dfs/dn
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[181] - configure success
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[117] - Convert boolean and integer to string
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[263] - size is :4
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[266] - config set value to RULE:[2:$1/$2@$0]([ndj]n\/.*@HADOOP\.COM)s/.*/hdfs/
RULE:[2:$1/$2@$0]([rn]m\/.*@HADOOP\.COM)s/.*/yarn/
RULE:[2:$1/$2@$0](jhs\/.*@HADOOP\.COM)s/.*/mapred/
DEFAULT
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[181] - configure success
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[181] - configure success
[INFO] 2024-05-20 13:37:25 TaskLogLogger-HDFS-NameNode:[45] - Start to execute hdfs namenode -bootstrapStandby
[ERROR] 2024-05-20 13:37:39 TaskLogLogger-HDFS-NameNode:[54] - Namenode standby failed
[INFO] 2024-05-20 13:37:39 TaskLogLogger-HDFS-NameNode:[182] - 2024-05-20 13:37:27,815 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ddp3/192.168.4.182
STARTUP_MSG:   args = [-bootstrapStandby]
STARTUP_MSG:   version = 3.3.3
STARTUP_MSG:   classpath = /opt/datasophon/hadoop-3.3.3/etc/hadoop:/opt/datasophon/hadoop-3.3.3/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar:/opt/datasophon/hadoop-3.3.3/share/hadoop/common/lib/snappy-3.3.3/share/hadoop/yarn/webapps:/opt/datasophon/hadoop-3.3.3/share/hadoop/yarn/yarn-service-examples:/opt/datasophon/hadoop-3.3.3/jmx/jmx_prometheus_javaagent-0.16.1.jar
STARTUP_MSG:   build = https://github.com/apache/hadoop.git -r d37586cbda38c338d9fe481addda5a05fb516f71; compiled by 'stevel' on 2022-05-09T16:36Z
STARTUP_MSG:   java = 1.8.0_333
************************************************************/
2024-05-20 13:37:27,821 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2024-05-20 13:37:27,906 INFO namenode.NameNode: createNameNode [-bootstrapStandby]
2024-05-20 13:37:28,084 INFO ha.BootstrapStandby: Found nn: nn1, ipc: ddp2/192.168.4.181:8020
2024-05-20 13:37:28,483 INFO common.Util: Assuming 'file' scheme for path /data/dfs/nn in configuration.
2024-05-20 13:37:28,496 INFO common.Util: Assuming 'file' scheme for path /data/dfs/nn in configuration.
2024-05-20 13:37:29,665 INFO ipc.Client: Retrying connect to server: ddp2/192.168.4.181:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2024-05-20 13:37:30,666 INFO ipc.Client: Retrying connect to server: ddp2/192.168.4.181:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2024-05-20 13:37:38,674 INFO ipc.Client: Retrying connect to server: ddp2/192.168.4.181:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2024-05-20 13:37:38,680 WARN ha.BootstrapStandby: Unable to fetch namespace information from remote NN at ddp2/192.168.4.181:8020: Call From ddp3/192.168.4.182 to ddp2:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
2024-05-20 13:37:38,680 ERROR ha.BootstrapStandby: Unable to fetch namespace information from any remote NN. Possible NameNodes: [RemoteNameNodeInfo [nnId=nn1, ipcAddress=ddp2/192.168.4.181:8020, httpAddress=http://ddp2:9870]]
2024-05-20 13:37:38,682 INFO util.ExitUtil: Exiting with status 2: ExitException
2024-05-20 13:37:38,683 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ddp3/192.168.4.182
************************************************************/

[ERROR] 2024-05-20 13:37:39 TaskLogLogger-HDFS-NameNode:[197] - 

ddp3安装 DataNode报错:

can not find log file

ddp3安装 ZKFC报错:

can not find log file

[root@ddp1 data]# jps 49890 QuorumPeerMain 2757 DataSophonApplicationServer 50316 Jps 3022 WorkerApplicationServer [root@ddp1 data]# [root@ddp1 data]# [root@ddp1 data]# jps 49890 QuorumPeerMain 2757 DataSophonApplicationServer 50440 JournalNode 50985 Jps 3022 WorkerApplicationServer [root@ddp1 data]# [root@ddp1 data]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.4.180 ddp1 192.168.4.181 ddp2 192.168.4.182 ddp3

[root@ddp2 logs]# jps 33795 NameNode 33971 DFSZKFailoverController 34135 Jps 2844 WorkerApplicationServer 33725 JournalNode 19407 QuorumPeerMain [root@ddp2 logs]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.4.180 ddp1 192.168.4.181 ddp2 192.168.4.182 ddp3

[root@ddp3 ~]# jps 19477 QuorumPeerMain 36393 Jps 10683 WorkerApplicationServer 36283 JournalNode [root@ddp3 ~]# [root@ddp3 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.4.180 ddp1 192.168.4.181 ddp2 192.168.4.182 ddp3



都是默认配置
![image](https://github.com/datavane/datasophon/assets/31399968/ad6e4fc1-5f1d-4a0f-aca7-ecd247f2ec5d)

rm -rf  /data/tmp &&  rm -rf  /data/dfs  &&  rm -rf  /opt/datasophon/hadoop-3.3.3  &&  rm -rf  /opt/datasophon/hdfs   &&  rm -rf  /home/*      我删除了目录重新安装也是报这个错

### What you expected to happen

HDFS一次性安装成功

### How to reproduce

OS: Centos 7.9
Version: DataSophon-1.2.1

### Anything else

_No response_

### Version

dev

### Are you willing to submit PR?

- [X] Yes I am willing to submit a PR!

### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
ToolsOnKeys commented 5 months ago

新增ipc.client.connect.max.retries和ipc.client.connect.retry.interval参数 作用:防止出现journalnode服务ConnectException

yuhuang123456 commented 5 months ago

新增ipc.client.connect.max.retries和ipc.client.connect.retry.interval参数 作用:防止出现journalnode服务ConnectException

这个1.2.1版本怎么接入

ToolsOnKeys commented 5 months ago

新增ipc.client.connect.max.retries和ipc.client.connect.retry.interval参数 作用:防止出现journalnode服务ConnectException

这个1.2.1版本怎么接入

详见 https://github.com/datavane/datasophon/pull/569

yuhuang123456 commented 5 months ago

新增ipc.client.connect.max.retries和ipc.client.connect.retry.interval参数作用:防止出现journalnode服务ConnectException

这个1.2.1版本怎么呈现

详见#569 1.2.1版本尝试在hdfs下面的core-site里面加入重试的value。但是还是一样报连接异常.我本地是3.3.6版本,从3.3.3复制了etc/hadoop文件下复制了whitelist和blacklist,fair-scheduler.xml三个文件到3.3.6. 另外服务器是不能联网的,安装一些环境也只能离线安装,不知道是不是这个影响了。