AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

Index failure 2023-03-02 #859

Closed javier-molina closed 1 year ago

javier-molina commented 1 year ago

See https://atlaslivingaustralia.slack.com/archives/G0106GABXC3/p1677683143024729

sadeghim commented 1 year ago

It seems it’s hdfs problem:

sad036@aws-spark-quoll-master:~$ sudo -u hdfs /data/hadoop/bin/hdfs dfsadmin -report
[sudo] password for sad036:
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 305883
Blocks with corrupt replicas: 0
Missing blocks: 305883
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0

-------------------------------------------------
Dead datanodes (6):

Name: 172.30.1.103:50010 (ip-172-30-1-103.ap-southeast-2.compute.internal)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 580143333376 (540.30 GB)
DFS Used: 381588037632 (355.38 GB)
Non DFS Used: 11197075456 (10.43 GB)
DFS Remaining: 161550536704 (150.46 GB)
DFS Used%: 65.77%
DFS Remaining%: 27.85%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Wed Mar 01 23:33:25 AEDT 2023

Name: 172.30.1.151:50010 (ip-172-30-1-151.ap-southeast-2.compute.internal)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 580143333376 (540.30 GB)
DFS Used: 370366611456 (344.93 GB)
Non DFS Used: 11381227520 (10.60 GB)
DFS Remaining: 172587810816 (160.73 GB)
DFS Used%: 63.84%
DFS Remaining%: 29.75%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Wed Mar 01 23:33:25 AEDT 2023

Name: 172.30.1.226:50010 (ip-172-30-1-226.ap-southeast-2.compute.internal)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 580143333376 (540.30 GB)
DFS Used: 351116881920 (327.00 GB)
Non DFS Used: 14042107904 (13.08 GB)
DFS Remaining: 189176659968 (176.18 GB)
DFS Used%: 60.52%
DFS Remaining%: 32.61%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Wed Mar 01 23:33:30 AEDT 2023

Name: 172.30.1.43:50010 (ip-172-30-1-43.ap-southeast-2.compute.internal)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 685833240576 (638.73 GB)
DFS Used: 287299239936 (267.57 GB)
Non DFS Used: 172565516288 (160.71 GB)
DFS Remaining: 195865939968 (182.41 GB)
DFS Used%: 41.89%
DFS Remaining%: 28.56%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Wed Mar 01 23:33:34 AEDT 2023

Name: 172.30.1.65:50010 (ip-172-30-1-65.ap-southeast-2.compute.internal)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 580143333376 (540.30 GB)
DFS Used: 357688819712 (333.12 GB)
Non DFS Used: 44810661888 (41.73 GB)
DFS Remaining: 151836168192 (141.41 GB)
DFS Used%: 61.66%
DFS Remaining%: 26.17%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Wed Mar 01 23:33:27 AEDT 2023

Name: 172.30.1.6:50010 (ip-172-30-1-6.ap-southeast-2.compute.internal)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 580143333376 (540.30 GB)
DFS Used: 308578775040 (287.39 GB)
Non DFS Used: 11246698496 (10.47 GB)
DFS Remaining: 234510176256 (218.40 GB)
DFS Used%: 53.19%
DFS Remaining%: 40.42%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 0
Last contact: Tue Feb 07 17:50:46 AEDT 2023
sadeghim commented 1 year ago

After restarting hdfs we have six live nodes:

sad036@aws-spark-quoll-master:~$ sudo -u hdfs /data/hadoop/bin/hdfs dfsadmin -report
Safe mode is ON
Configured Capacity: 3586549907456 (3.26 TB)
Present Capacity: 3162129760256 (2.88 TB)
DFS Remaining: 1105491394560 (1.01 TB)
DFS Used: 2056638365696 (1.87 TB)
DFS Used%: 65.04%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 90000

-------------------------------------------------
Live datanodes (6):