cea-hpc / shine

Lustre administration tool
GNU General Public License v2.0
22 stars 9 forks source link

Installing lustre 2.2 on CentOS 6.2 fails with shine "no state report from node $(NODENAME)" #128

Closed degremont closed 7 years ago

degremont commented 12 years ago

Environment :

rpm -Uvh e2fsprogs-1.41.90.wc4-7.el6.x86_64.rpm e2fsprogs-libs-1.41.90.wc4-7.el6.x86_64.rpm 
rpm -ivh kernel-firmware-2.6.32-220.4.2.el6_lustre.x86_64.rpm kernel-2.6.32-220.4.2.el6_lustre.x86_64.rpm lustre-ldiskfs-3.3.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64.rpm lustre-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64.rpm lustre-modules-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64.rpm 

# rpm -qa|grep lustre
kernel-firmware-2.6.32-220.4.2.el6_lustre.x86_64
kernel-2.6.32-220.4.2.el6_lustre.x86_64
lustre-modules-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
lustre-ldiskfs-3.3.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64
lustre-2.2.0-2.6.32_220.4.2.el6_lustre.x86_64.x86_64

# rpm -qa|grep shine
shine-0.911-1.el6.noarch
# rpm -qa|grep clustershell
clustershell-1.6-1.el6.noarch

# cat /etc/modprobe.d/lustre.conf 
options lnet networks=tcp0(eth0)

# modprobe ldiskfs
# modprobe lnet
# modprobe lustre

# lctl list_nids
10.254.1.189@tcp

What works :

# mkfs.lustre --fsname=toto --mgs --mdt /dev/sdb
# mkfs.lustre --fsname=toto --ost --mgsnode=10.254.1.189@tcp /dev/sdc
# mkfs.lustre --fsname=toto --ost --mgsnode=10.254.1.189@tcp /dev/sdd
# mount -t lustre /dev/sdb /mds-data/
# mount -t lustre /dev/sdc /oss1-data/
# mount -t lustre /dev/sdd /oss2-data/
# mount -t lustre 10.254.1.189@tcp:/toto /lustre

But using shine does not :

# cat /etc/shine/toto.lmf 
fs_name: toto
nid_map: nodes=10.254.1.189 nids=10.254.1.189@tcp
mount_path: /lustre
mgt: node=10.254.1.189 dev=/dev/sdb
mdt: node=10.254.1.189 dev=/dev/sdc
ost: node=10.254.1.189 dev=/dev/sdd
client: node=10.254.1.189

# shine install -m /etc/shine/toto.lmf 
Using Lustre model file /etc/shine/toto.lmf
Registering FS toto to backend...
Filesystem toto registered.
Updating file system configuration file `toto.xmf' on 10.254.1.189
Use `shine format -f toto' to initialize the file system.

# shine format -f toto
Format toto on 10.254.1.189: are you sure? (y)es/(N)o: y
WARNING: no state report from node 10.254.1.189 (10.254.1.189@tcp)
WARNING: no state report from node 10.254.1.189 (10.254.1.189@tcp)
WARNING: no state report from node 10.254.1.189 (10.254.1.189@tcp)
Format failed
FILESYSTEM COMPONENTS STATUS (toto)
+-----+--+-------------+--------------+
|type |# |   nodes     |    status    |
+-----+--+-------------+--------------+
|MGT  |1 |10.254.1.189 |CHECK FAILURE |
|MDT  |1 |10.254.1.189 |CHECK FAILURE |
|OST  |1 |10.254.1.189 |CHECK FAILURE |
+-----+--+-------------+--------------+

Reported by: phantez

degremont commented 12 years ago

Maybe the lnet network configuration is wrong because I have trouble to connect lustre client but I am still surprise that it does not work locally as I did in the example.

Is shine doing test on the network prior formating ?

Original comment by: phantez

degremont commented 12 years ago
        status
            changed from new to assigned

Please use real hostnames as the 'nodes' parameter in nid_map, it's probably the reason of the no state report. We have to check to work around this issue I think...

Original comment by: thiell

degremont commented 12 years ago

After doing some checking, I think it will be difficult to support such usage. Moreover it is much more simpler to identifiate node by their name.

Could you confirm that when using node name, the issue disappeares? If so, we will close this ticket.

Original comment by: degremont

degremont commented 12 years ago

It works fine replacing the IP address (10.254.1.189) by the name (master).

shine format -f toto

Format toto on master: are you sure? (y)es/(N)o: y [15:38] In progress for 3 component(s) on master ... Format successful. FILESYSTEM COMPONENTS STATUS (toto) +-----+--+-------+--------+ |type |# |nodes | status | +-----+--+-------+--------+ |MGT |1 |master |offline | |MDT |1 |master |offline | |OST |1 |master |offline | +-----+--+-------+--------+

BTW the problem with the others clients was the default iptables rules of CentOS.

Original comment by: phantez

degremont commented 12 years ago
        status
            changed from assigned to closed

        resolution
            set to wontfix

Ok, if everything is fine, I will close this ticket.

Original comment by: degremont