cea-hpc / shine

Lustre administration tool
GNU General Public License v2.0
22 stars 9 forks source link

shine cannot unload modules when using the lnet.service #211

Open btravouillon opened 5 years ago

btravouillon commented 5 years ago

I'm using the lnet.service and /etc/lnet.conf to configure the LNet on my servers and clients:

[root@mds1 ~]# grep -v "^#" /etc/lnet.conf 
net:
    - net type: tcp1
      local NI(s):
        - interfaces:
              0: eth0

This service loads the lnet module, configure the lnet, then import the /etc/lnet.conf.

[root@mds1 ~]# systemctl cat lnet.service|grep Exec
ExecStart=/sbin/modprobe lnet
ExecStart=/usr/sbin/lnetctl lnet configure
ExecStart=/usr/sbin/lnetctl import /etc/lnet.conf
ExecStop=/usr/sbin/lustre_rmmod ptlrpc
ExecStop=/usr/sbin/lnetctl lnet unconfigure
ExecStop=/usr/sbin/lustre_rmmod libcfs ldiskfs

shine stop reports an error while trying to remove the Lustre modules from the kernel:

[root@mds1 ~]# shine stop
[17:53] In progress for 4 component(s) on oss[1-2] ...
oss1: Unload modules failed
oss1: >> rmmod: ERROR: Module ksocklnd is in use
oss2: Unload modules failed
oss2: >> rmmod: ERROR: Module ksocklnd is in use
mds1: Unload modules failed
mds1: >> rmmod: ERROR: Module ksocklnd is in use
Stop successful.
= FILESYSTEM STATUS (scratch) =
TYPE # STATUS  NODES
---- - ------  -----
MGT  1 offline mds1
MDT  1 offline mds1
OST  4 offline oss[1-2]

It would need to unconfigure the lnet before trying to remove the lnet module from the kernel.

The simpler solution would be to stop unloading the modules when running shine stop. :-) I can rebase and enhance https://review.gerrithub.io/c/cea-hpc/shine/+/367989

Then we could plan to add support for the lnet.service if you believe this is worthwhile.

martinetd commented 5 years ago

I think running lnetctl lnet configure + lnetctl import <configured file> after module load and runing lnetctl lnet unconfigure before module unload might make more sense.

The lnet.service really is too far from how shine expects the system to be configured, but having an /etc/lnet.conf would be much more flexible than kernel module parameters.

degremont commented 5 years ago

I think both are doable.

Supporting lnetctl import /etc/lnet.conf is definitely something useful that Shine should support.

Delegating the modules/router supports to external scripts is fine to me, as an optional step. Relying on module_unload=false feature should able to achieve that? We need to update the current patch to disable StartRouter/StopRouter or add additional flags

martinetd commented 5 years ago

pushed https://review.gerrithub.io/c/cea-hpc/shine/+/468899 as a draft, 100% untested code - will work on that tomorrow morning if life allows, but comments on overall architecture are welcome earlier (EDIT: didn't go for external script but that'd work for me too, happy to change what I started with in that direction)