Closed Andy2244 closed 4 years ago
On (19/11/27 12:18), Andy Walsh wrote:
I did some more testing, while trying to fix our initd/procd logic. There seems to-be a general problem on how cifsd-tools interacts with the kernel server and how changes in smb.conf are handled.
Here is a basic example/scenario that should work, but doesn't.
• Create a basic share via:
[test] path = /tmp guest ok = yes
• Start the service (modprobe + cifsd) • Connect to the share and try to create a folder • Now a user may notice that the share is not write-able by default • We now change the smb.conf to this:
[test] path = /tmp guest ok = yes read only = no
5. We now restart the service (this will only restart the userspace) 6. Service did restart and the user just see's this in the logs: kcifsd: handle_startup_event:362: Reconnect to a new user space daemon 7. The share is still read only...
This should not be a valid scenario. We should not restart the userspace part.
We have cifsshareadd and cifsuseradd tools for configs updates. They send special signals to user-space cifsd so it reloads config files, without being restarted.
-ss
We have cifsshareadd and cifsuseradd tools for configs updates. They send special signals to user-space cifsd so it reloads config files, without being restarted.
I understand this for cifsuseradd, since that's how samba worked, but cifsshareadd should be optional. We have like 20 years of smb.conf handling and setup scripts. Cifsd service "looks" so similar to samba for most users, which is actually a good point, yet we cant handle smb.conf changes manually?
This means for trivial changes users have to rely on a working shell, instead of just editing the smb.conf. Most remote admin tools don't have a easy way to open a shell in a browser, so while they have tools to edit config files and restart a service, you could not administer cifsd the same way you could samba.
In my case this would mean i need to maintain/test two different working sets of userspace ui tools and init scripts, one for each samba/cifsd. We use a UCI->smb.conf translation layer, so config changes are universal and persist on firmware updates. I don't even really know how i could integrate this logic to our UCI system.
PS: So at least reset the server if the userspace cifsd receives a kill/sigterm, as we already discussed it's very "unusual" that a service is still working after you initiate 'service stop'.
@sergey-senozhatsky I need to remeber why we can't shutdown kcifsd when ucifsd is terminated ? smb client try to reconnect when server is disconnect and connect again(reset). reconnection means smb client send smb2 negotiate and session setup again. What is a problem if we shutdown kcifsd ?
Just as comparison this is what my init code looks like atm for cifsd, to get it working correctly:
start_service()
{
init_config
if [ ! -e /etc/cifs/smb.conf ]; then
logger -t 'cifsd' "missing config /etc/cifs/smb.conf, needs to-be created manually!"
exit 1
fi
if [ -e /sys/module/cifsd ]; then
if [ -e /sys/class/cifsd-control/kill_server ]; then
echo hard > /sys/class/cifsd-control/kill_server
# we need a extra timeout for the reset
sleep 5
fi
fi
modprobe cifsd 2> /dev/null
if [ ! -e /sys/module/cifsd ]; then
logger -t 'cifsd' "modprobe of cifsd module failed, can\'t start cifsd!"
exit 1
fi
logger -t 'cifsd' "Starting CIFS/SMB userspace service."
procd_open_instance
procd_set_param command /usr/sbin/cifsd --n
procd_set_param file /var/etc/cifs/smb.conf
procd_close_instance
}
stop_service()
{
logger -t 'cifsd' "Stopping CIFSD userspace service."
killall cifsd > /dev/null 2>&1
[ -e /sys/module/cifsd ] && rmmod cifsd > /dev/null 2>&1
# With open smb connections rmmod is not possible, without waiting for the long 'ipc timeout', so we use 'kill_server'!
if [ -e /sys/module/cifsd ]; then
logger -t 'cifsd' "triggering kill_server"
if [ -e /sys/class/cifsd-control/kill_server ]; then
echo hard > /sys/class/cifsd-control/kill_server
# we need a extra timeout for the reset
sleep 5
fi
fi
# next try
[ -e /sys/module/cifsd ] && rmmod cifsd > /dev/null 2>&1
# check again
if [ -e /sys/module/cifsd ]; then
# wait more...
sleep 3
fi
# last try
[ -e /sys/module/cifsd ] && rmmod cifsd > /dev/null 2>&1
if [ -e /sys/module/cifsd ]; then
logger -t 'cifsd' "module still loaded after 8s timeout"
fi
[ -f /tmp/cifsd.lock ] && rm /tmp/cifsd.lock
}
here is what the samba4 code looks like:
start_service() {
init_config
procd_open_instance
procd_set_param command /usr/sbin/samba -F
procd_set_param respawn
procd_set_param file /var/etc/smb.conf
procd_close_instance
}
If kcifsd would run the "hard" reset logic automatically, when the cifsd gets the kill/sigterm than i could remove a lot of the special handling, the only odd thing is that the "hard reset" needs a 4-5s timeout atm. This would also allow to make manual smb.conf changes + systemd service restart, without the need to rely on the external cifsshareadd.
On (19/11/28 01:52), Andy Walsh wrote:
We have cifsshareadd and cifsuseradd tools for configs updates. They send special signals to user-space cifsd so it reloads config files, without being restarted.
I understand this for cifsuseradd, since that's how samba worked, but cifsshareadd should be optional. We have like 20 years of smb.conf handling and setup scripts. Cifsd service "looks" so similar to samba for most users, which is actually a good point, yet we cant handle smb.conf changes manually?
Just send SIGHUP signal to cifsd-tools manager process once you are done editing config file. (manager PID is stored in /tmp/cifsd.lock). You can take a look at how we notify cifsd-tools that config files have changed here: lib/cifsdtools.c : notify_cifsd_daemon().
There is "standard" UNIX way to ask a detached daemon to reload its config files. Look [1]:
Signals have always been a convenient method of inter-process
communication (IPC), but in early implementations there were no
user-definable signals (such as the later additions of SIGUSR1
and SIGUSR2) that programs could intercept and interpret for their
own purposes. For this reason, applications that did not require
a controlling terminal, such as daemons, would re-purpose SIGHUP
as a signal to re-read configuration files, or reinitialize.
It has been this way since... forever.
Guys, please, don't tell me that people restart http servers with thousands of active connections, or restart databases with gigabyte sized tables and thousands of connections [2], etc. etc. etc. in order to re-load configuration files. They don't.
[1] https://en.wikipedia.org/wiki/SIGHUP grep for SIGHUP [2] https://www.postgresql.org/docs/9.1/app-pg-ctl.html
-ss
On (19/11/28 06:29), Andy Walsh wrote: [..]
here is what the samba4 code looks like: start_service() { init_config
start main AD-DC daemon, will spawn (smbd,nmbd,winbindd) as needed/configured.
if [ "$DISABLE_AD_DC" -ne 1 ] && [ -x /usr/sbin/samba ]; then procd_open_instance procd_set_param command /usr/sbin/samba -F procd_set_param respawn procd_set_param file /var/etc/smb.conf procd_close_instance fi
JFYI
Samba has a ton of ways to reload/update configuration without unneeded service termination. Not only for RAP (remove administration), but also for local administration as well. Samba's smbcontrol [1]:
reload-config
Force daemon to reload smb.conf configuration file. Can be sent
to smbd, nmbd, or winbindd.
smbcontrol sends an IPC message to the corresponding PID
send_message(msg_ctx, pid, MSG_SMB_CONF_UPDATED, NULL, 0);
And as one can guess when smbd, nmbd, or winbindd receive MSG_SMB_CONF_UPDATED they simply re-load the config. None of them exit(1) or kill(self) to reload config file.
Once again, do not kill the service to reload the configuration file. This is NOT normal. Just because you can kill a process, doesn't mean that you should.
[1] https://www.samba.org/samba/docs/current/man-html/smbcontrol.1.html
-ss
On (19/11/28 06:24), Namjae Jeon wrote:
[1]@sergey-senozhatsky I need to remeber why we can't shutdown kcifsd when ucifsd is terminated ? smb client try to reconnect when server is disconnect and connect again(reset). reconnection means smb client send smb2 negotiate and session setup again. What is a problem if we shutdown kcifsd ?
Before I switch to technical reasons why killing server for simple configuration file reload is really-really-really bad, let me try an analogy.
To explain what "a server" actually is.
Imagine a public bus on a miserable, rainy, cold day. With many people on that bus, minding their own business, heading to offices, schools, hospitals, etc. On every stop somebody gets on, somebody gets off. So far so good, right? What you guys want to do is every time somebody gets on (we have a new named net-share at path "seat" X), or changes his seat (named net-share changes its path), or gets off (net-share is removed/disabled) the bus, the bus should shut down its engine, everyone should get off the bus, hang around for a while (remember it's a miserable, rainy, cold day), and get on the bus again (!). Does that sound like a good bus? Do you really think that people will be satisfied with such service?
And this is what server is. It's a good public bus. It just works. It takes people from point A to B, without disruptions.
Do I need to further explain why killing the service for 100 active users in order to change a configuration of a possibly unused (no current active session) net-share is a terrible thing to do?
-ss
I know why a "soft" config reload feature exists, but thanks for the anecdote.
My point is this, its unusual that a service keeps working after you do a systemctl stop service
or a service that would not apply config changes after you do a systemctl restart service
?
The point was not if you should or should not use soft reloads on config changes, my point was that cifsd behaves weird regarding service stop/restart. The other point was that users that want a simple fileshare are used to the edit smb.conf + restart service workflow. Most basic guide's don't take the time to explain smbcontrol or systemd service reload, also embedded device distro's may not even ship with those tools and just have a smbd/nmbd.
I can work around this specifically for openwrt, yet i see no reason why you want to make this mandatory? As you pointed out if someone wants to-do soft reload you have tools in place and if a distro want to go through those hoops for cifsd, thats fine too. Yet why should the most basic workflow break aka edit smb.conf + restart service?
PS: Don't get me wrong, i'm a lowly maintainer so just giving feedback based on my perspective and what problem reports i get. I already got a few reports of users that did trip over this difference compared to samba, while trying to manually fix there cifsd shares. This issue aside, you guys do a very good job fixing and responding to issues, thanks.
On (19/11/29 01:47), Andy Walsh wrote:
I know why a "soft" config reload feature exists, but thanks for the anecdote. My point is this, i'm not aware of a service that keeps working after you do a systemctl stop service or a service that would not apply config changes after you do a systemctl restart service?
And my points is - why do you do systemctl service stop/start for config reload?
The point was not if you should or should not use soft reloads on config changes, my point was that cifsd behaves weird regarding service stop/restart.
And my point was - if you need a graceful config reload, do a graceful config reload. Don't overshoot :)
My design goal was to make cifsd-tools a stateless policy engine. The so called heavy lifting (kernel) part of cifsd does not depend on user space cifsd and does not really care. We do policing on the user-space side (user password file parsing, SHARE_ENUM handling, and so on - IOW admin things, things that kernel space can't care even less about), but that's pretty much it. Once you passed verification you do file I/O without any user-space involvement.
The other point was that most users that want a simple fileshare are used to the edit smb.conf + restart service paradigm. I cant remember a basic guide that took the time to explain smbcontrol, most embedded device distro's don't even ship with those tools and just have a smbd/nmbd.
Well, it's never too late to learn a new habit. Graceful (or, as you put it, soft) reloads are here to help.
Github does not restart their cluster when somebody changes git repository name; you don't reboot your build server when you add a new member to your team (and hence add a new user), do you? There are more examples.
I can work around this specifically for openwrt, yet i see no reason why you want to make this mandatory?
Well, a similar question. Why do you make hard reset mandatory, when "soft" reset can suffice?
As you pointed out if someone wants to-do soft reload you have tools in place and if a distro want to go through those hoops for cifsd, thats fine too. Yet why should the most basic workflow break aka edit smb.conf + restart service?
Send a SIGHUP instead of service restart.
What is the root cause of misunderstanding? Do you tend to look at cifsd as a service that runs on your own router for your own, personal usage and entertainment; while I tend to treat it as "a server"?
cifsd might be a bit more than single user use-case (and I'm not saying that your use case is invalid. It's just we might have other use cases). It might be for 1000s of users with 1000s of active sessions. And those users don't give a single whip if the server admin wants to add a new net-share, as long as the service is reliable. Restarting the service and expecting current users to reconnect whenever server config file gets updated is not going to make them happy.
-ss
I'm not arguing against any of those points, i think we talk past each-other. So just let me ask: How do you plan to implement stop/restart for your systemd example scripts?
Forget about any of the config reload stuff please, but if i want to actually stop cifsd i do not expect that the share is still accessible. Similar if i do a explicit restart and not a reload, i expect the service to-do a actual restart() aka stop+start and smb.conf changes to apply. So do you expect init/systemd scripts to implement the kill_switch as the "correct" way to-do a stop?
Here is a issue, i had early on that illustrates the "weird" stop logic. I did build cifsd from source and used your systemd script on Debian, than i did run into this issue:
systemctl --user start cifsd.service
systemctl --user stop cifsd.service
Now samba would not start correctly anymore, because:
netstat -an | egrep 445
tcp 0 0 0.0.0.0:445 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:445 0.0.0.0:* LISTEN
Yet rmmod did not work... so i was not sure what to-do now? The default 'ipc timout' is 0, so will never reset the server, so without knowing the kill_switch i had to reboot the system.
So is the kill_switch+rmmod logic missing from your systemd scripts or you have other plans to address issues like this?
There may be a hint in kernel nfs server, which similar with cifsd as kernel server. Need to check how it start/stop nfsd service in linux?
On (19/11/29 03:20), Andy Walsh wrote:
Here is a issue, i had early on that illustrates the "weird" stop logic. I did build cifsd from source and used your systemd script on Debian, than i did run into this issue:
• systemctl --user start cifsd.service • did some speed testing • systemctl --user stop cifsd.service
[..] So is the kills_switch logic missing from your systemd scripts or you have other plans to address issues like this?
Hmm, we carry that script around... I forgot about it. Right, from "--user" context we cannot reset kcifsd.
So you don't have server automatic reset (IPC timeout [which is imperfect and probably buggy]) and you don't want to (aka don't know about) manually reset the server (via sysfs).
So, having a "default" IPC timoeut in kcifsd obviously is not going to work.
Shall we remove --user systemd script and replace it with root systemd service script which will start user-space service (under non-privileged user) + kcifsd on start and, hence, will kill user-space and reset kcifsd (it has required privileges) on stop.
In this case we need to create our own user when we install the package (which is not super uncommon. Even `git' has its own group in /etc/group).
So...
Will this work?
-ss
- Use root privileges to start/stop kcifsd. Will this work?
If you implement the same kill_server + /sys/class/cifsd-control/stats
wait-loop logic i have too, yes should work. Not sure if systemd has some special syntax to-do the kcifsd wait?
That's because even with root, you can't rmmod if kcifsd has open connections and the kill_switch is not immediate. aka cifsd-team/cifsd#153
On (19/11/29 04:39), Namjae Jeon wrote:
There may be a hint in kernel nfs server, which similar with cifsd as kernel server. Need to check how it start/stop nfsd service in linux?
http://man7.org/linux/man-pages/man7/nfs.systemd.7.html
The first sentence:
The nfs-utils package provides a suite of systemd unit files which
allow the various services to be started and managed. These unit
files ensure that the services are started in the correct order, and
the prerequisites are active before dependant services start. As
there are quite few unit files, it is not immediately obvious how
best to achieve certain results. The following subsections attempt
to cover the issues that are most likely to come up.
It seems that services are sort of independent. Hence the start order requirement.
Looking at the repository:
Things look quite complex there.
-ss
Things look quite complex there.
Is this there kill_switch?
ExecStop=/usr/sbin/rpc.nfsd 0
It also seems there is no direct communication with the kernel module via /sys, seems all are options in userspace?
On (19/11/29 04:50), Andy Walsh wrote:
If you implement the same kill_server + /sys/class/cifsd-control/stats wait-loop logic i have too, yes should work. Not sure if systemd has some special syntax to-do the kcifsd wait?
So the start part is probably handled by After/Before ordering: http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=systemd/nfs-server.service;h=24118d69396551aa130fe7789aedddd82b036a99;hb=HEAD
systemd / nfs-server.service
[..] After= network-online.target local-fs.target After= proc-fs-nfsd.mount rpcbind.socket nfs-mountd.service After= nfs-idmapd.service rpc-statd.service After= nfsdcld.service Before= rpc-statd-notify.service [..]
We sort of have startup order requirement - it's kinda nice to have insmod before exec cifsd-tools.
The stop logic is tricky. From simple `stop' POV we don't have order requirements. But, here is the thing:
That's because even with root, you can't rmmod if kcifsd has open connections and the kill_switch is not immediate. aka [1]cifsd-team/cifsd#153
You are right. That's a big issue. Resetting kcifsd is like landing a shuttle - you have one chance. When things go wrong they go wrong big times (aka restart the box). And we cannot make it 100% reliable. That's the price of running stuff in kernel space.
kill_switch is not immediate (+ bugs) nor reliable (bugs, once again).
-ss
And we cannot make it 100% reliable.
Why exactly, i mean the wait loop should work and why should the kill_switch fail? I mean its designed to reset?
It also seems you need a 'reset' and a real 'close', there should be a way to really 'close' kcifsd, so it stops listing to sockets or does anything else, basically the same state after you did the initial insmod. This way even if the wait loop is messed up, kcifsd is not doing anything and is just loaded. Than missing the final rmmod is not really a problem, outside of memory leaks or trying to update the kmod itself.
On (19/11/29 04:54), Andy Walsh wrote:
Things look quite complex there.
Is this there kill_switch? ExecStop=/usr/sbin/rpc.nfsd 0
It might be. If they run it under root then it's sort of the same as writing to kill_switch. Except that kill_switch defers server kill. Not sure how fast rpc.nfsd kill is.
If they run it under normal user then we shall add something similar to cifsd-tools. But the deferred server kill will remain deferred.
-ss
On (19/11/29 05:15), Andy Walsh wrote:
And we cannot make it 100% reliable.
Why exactly, i mean the wait loop should work and why should the kill_switch fail? I mean its designed to reset?
If we have bugs in smb commands handlers (kernel kthreads) and we crash kernel kthreads or deadlock, or livelock them - then things go bad onwards.
-ss
If we have bugs in smb commands
Sure bugs, but i meant there is no inherent design problem with the kill_switch code not working? So we would expect it to work 100% reliable, assuming there are no bugs?
On (19/11/29 05:22), Andy Walsh wrote:
If we have bugs in smb commands
Sure bugs, but i meant there is no inherent design problem with the kill_switch code not working? So we would expect it to work 100% reliable, assuming there are no bugs?
Aside of hard core bugs, the rest is just me not being to write a working code, I guess. Deferred nature is sort of there by design, though. To some extent.
Need to look more at that code.
-ss
Need to look more at that code.
So can you implement a 'reset' and a 'close', the difference being that 'close' closes the sockets/interface binds? So it can just sit idle as kmod doing nothing, so we don't have to worry about it after a unclean stop service? If possible the 'close' state could be processed immediate/faster?
I just played around with the cifsshareadd
and it seems i can only change share related options? So how do i trigger a "soft" config reload for global options, like the interface, ipc timeout or netbios name?
I just played around with the
cifsshareadd
and it seems i can only change share related options? So how do i trigger a "soft" config reload for global options, like the interface, ipc timeout or netbios name?
A bunch of global options cannot be changed dynamically. E.g. signing, server string, domain name, etc. etc.
Dammit... I accidentally closed the issue. Sorry!
Re-opened.
So can you implement a 'reset' and a 'close', the difference being that 'close' closes the sockets/interface binds? So it can just sit idle as kmod doing nothing, so we don't have to worry about it after a unclean stop service? If possible the 'close' state could be processed immediate/faster?
How do you want to execute those commands? Will sysfs knob work?
How do you want to execute those commands? Will sysfs knob work?
Sure, but i guess in the future those should also all be cifsd userspace commands, that can be run and documented via -h
.
A bunch of global options cannot be changed dynamically. E.g. signing, server string, domain name, etc. etc.
Yeah i guessed so much, this makes soft reloading much more complicated for our config system, so i guess i stick with my restart logic.
On (19/12/02 01:40), Andy Walsh wrote:
A bunch of global options cannot be changed dynamically. E.g. signing, server string, domain name, etc. etc.
Yeah i guessed so much, this makes soft reloading much more complicated for our config system, so i guess i stick with my restart logic.
Do you change [global] options that often? What is the use case? Some global options can be changed dynamically - tcp port, ipc timeout, etc.
-ss
Sure, but i guess in the future those should also all be cifsd userspace commands, that can be run and documented via -h.
We cannot pass those to cifsd-tools daemon. Because it's a daemon. It seems that you are talking about some sort of cifsdcontrol tool.
Do you change [global] options that often? What is the use case? Some global options can be changed dynamically - tcp port, ipc timeout, etc.
The problem is how OpenWrt + UCI (Universal Config Interface) works. So a user never actually changes any of the service specific config files directly, instead we use UCI config files for all services. Those UCI files are updated and written either via the UCI tool or via our Web Config interface LUCI. Its the job of the init files to than translate those UCI config settings into some real config files.
The problem is that from the init perspective i either get a restart or reload trigger, but no information on what was changed in the UCI config file. While we do have signal + soft reload capabilities, i have only seen examples that generally do this on a config change, not per config section.
It seems that you are talking about some sort of cifsdcontrol tool.
I mean similar to nfs, do we want users to read the documentation and interact with the kmod via /sysfs, directly or use some user tools?
fixed with latest patch suggestions from cifsd-team/cifsd#153
I did some more testing, while trying to fix our initd/procd logic. There seems to-be a general problem on how cifsd-tools interacts with the kernel server and how changes in smb.conf are handled.
Here is a basic example/scenario that should work, but doesn't.
5) We now restart the service (this will only restart the userspace) 6) Service did restart and the user just see's this in the logs:
kcifsd: handle_startup_event:362: Reconnect to a new user space daemon
7) The share is still read only...At this point a normal or even advanced user may have no knowledge whats going on and probably think he/she did something wrong in the smb.conf. It is rather unusual for a user to now try to reload the kernel module, which wont even work because we still have open connections. So at this point the user may simple force a reboot, after some frustrating try&error via smb.conf. Than after the reboot it "magically" works, until we try change something again.
So i propose to always fully reset the kernel server if the user space daemon gets a kill/sigterm. We really want to ensure that all possible smb.conf changes will be reflected/applied, if a user restarts the service. This also needs to apply to open connections, since its a security risk to wait for connections to close. In a server environment, the connection may stay open for a very long time before the system reboots and finally applies the changes.