att / ast

AST - AT&T Software Technology
Eclipse Public License 1.0
559 stars 152 forks source link

Which command line options `sync` builtin should support ? #1057

Open siteshwar opened 5 years ago

siteshwar commented 5 years ago

In ksh93u+ release, sync builtin did not support any command line options. ksh93v- introduced these options:

 -f, --sfsync    Calls sfsync(3) to flush all buffered sfio stream data.
 -s, --fsync=fd  Calls fsync(2) using the open file descriptor fd to transfer all data associated with fd to the storage device.sync waits until the transfer completes or an error is detected.
 -S, --syncfs=fd Calls syncfs(2) using the open file descriptor fd to transfer all data for the file system containing the file referred to by fd. Depending on the native system implementation sync may return before the data is actually written. Implies --sfsync.
 -X, --sync|all  Calls sync(2) to transfer all data for all filesystems. Depending on the native system implementaion the writing, although scheduled, is not necessarily complete upon return from sync. Since sync(2) has no failure indication, sync only fails for option/operand syntax errors, or when sync(2) does not return, in which case sync(1) also does not return. Implies --sfsync. This is the default when no options are specified.

Shall we continue to support these options ?

krader1961 commented 5 years ago

Yes, we should support those options as long as we're supporting the sync builtin. I've already expressed my belief that this has no business being a ksh builtin. It isn't performance critical or a frequently used command like chmod or cat. Too, now that journaled filesystems are the norm it is rare to need to run the sync command. Not to mention that even in the distant past running sync wasn't normally needed unless it wasn't possible to perform a clean OS shutdown.

siteshwar commented 5 years ago

Yes, we should support those options as long as we're supporting the sync builtin. I've already expressed my belief that this has no business being a ksh builtin. It isn't performance critical or a frequently used command like chmod or cat. Too, now that journaled filesystems are the norm it is rare to need to run the sync command. Not to mention that even in the distant past running sync wasn't normally needed unless it wasn't possible to perform a clean OS shutdown.

If you believe it's not used that much, why should we introduce new options for it in next release ?

krader1961 commented 5 years ago

If you believe it's not used that much, why should we introduce new options for it in next release ?

Because the code has already been written and presumably verified by the old AST team to work. It's also a trivial command, even with those new flags, and therefore unlikely to be incorrect. Too, the one new flag that might be justifiable is -s to force syncing the blocks associated with a specific fd to persistent storage. If we're going to rip out those flags we might as well just remove the builtin.

siteshwar commented 5 years ago

Let's review each option:

-f, --sfsync    Calls sfsync(3) to flush all buffered sfio stream data.

Users should not be aware whether underlying streams are sfio or stdio. If we want to sync all sfio streams when sync executes, it should be done by default.

 -s, --fsync=fd  Calls fsync(2) using the open file descriptor fd to transfer all data associated with fd to the storage device.sync waits until the transfer completes or an error is detected.

It can be easily done by executing sync as it syncs all streams. If we want to wait until data actually gets written to the device, it should be done by default.

 -S, --syncfs=fd Calls syncfs(2) using the open file descriptor fd to transfer all data for the file system containing the file referred to by fd. Depending on the native system implementation sync may return before the data is actually written. Implies --sfsync.

Same as above.

 -X, --sync|all  Calls sync(2) to transfer all data for all filesystems. Depending on the native system implementaion the writing, although scheduled, is not necessarily complete upon return from sync. Since sync(2) has no failure indication, sync only fails for option/operand syntax errors, or when sync(2) does not return, in which case sync(1) also does not return. Implies --sfsync. This is the default when no options are specified.

We can keep it as default.

krader1961 commented 5 years ago

Users should not be aware whether underlying streams are sfio or stdio.

True but not really relevant. The point of the option is apparently to provide a way for ksh scripts to force all buffered streams to be flushed. If ksh were using stdio the -f option would instead call fflush(NULL). Now whether or not such a feature even belongs in ksh is a different question. But to my mind the -f/--sfsync option is arguably the only justification for having this as a builtin. As you point out the other features aren't really needed or can be accomplished just fine by the external sync command.

siteshwar commented 5 years ago

Users should not be aware whether underlying streams are sfio or stdio.

True but not really relevant. The point of the option is apparently to provide a way for ksh scripts to force all buffered streams to be flushed. If ksh were using stdio the -f option would instead call fflush(NULL). Now whether or not such a feature even belongs in ksh is a different question. But to my mind the -f/--sfsync option is arguably the only justification for having this as a builtin. As you point out the other features aren't really needed or can be accomplished just fine by the external sync command.

Is there any reason why we should not sync all streams by default ? Why provide a separate option for it ?

krader1961 commented 5 years ago

Is there any reason why we should not sync all streams by default?

When would they be synced? If you do it on every write then that defeats the purpose of buffered output (i.e., improve efficiency by batching multiple small write()'s into a single larger write()).

siteshwar commented 5 years ago

Is there any reason why we should not sync all streams by default?

When would they be synced? If you do it on every write then that defeats the purpose of buffered output (i.e., improve efficiency by batching multiple small write()'s into a single larger write()).

All streams should be synced when sync builtin is executed.

krader1961 commented 5 years ago

All streams should be synced when sync builtin is executed.

Yes, I can see your point that invoking sync in any way should probably also flush output buffered by sfio/stdio. But you're proposing removing an option to flush buffered output without also syncing all data buffered by the kernel to disk. Ugh! And that behavior means that if the user forgets to enable the builtin they'll get the external command which won't flush sfio/stdio buffered output. All in all I'm still leaning towards removing this as a builtin unless we keep the sync -f option.

DavidMorano commented 5 years ago

@siteshwar

-f, --sfsync Calls sfsync(3) to flush all buffered sfio stream data.

This option should just plain not be there! No one outside of AST:SFIO land has ever heard of either SFIO or |sfsync(3sfio)|. If this somehow does serve some real purpose, then it should certainly be remained to something generic (not AST:SFIO specific).

Also, invoking the built-in |sync(1ksh)| without any options should DEFINITELY do the SAME thing as invoking the real program |sync(1)|, that is to execute |sync(2)| to the kernel. Otherwise all anger from users (and system administrators) it totally justified (and welcomed)!

siteshwar commented 5 years ago

And that behavior means that if the user forgets to enable the builtin they'll get the external command which won't flush sfio/stdio buffered output.

That actually justifies the presence of this builtin. We have already seen couple of bugs that were caused by out of sync sfio streams. Quickest workaround for such bugs in production could be executing sync builtin.

krader1961 commented 5 years ago

Quickest workaround for such bugs in production could be executing sync builtin.

No. You don't add a feature to work around bugs. Especially since your proposed workaround also involves calling sync() which forces all dirty blocks in the kernel to be written to disk. That "solution" is worse than the problem.

@DavidMorano The point of sync -f is to give a ksh script the same functionality a C program has by calling fflush(NULL). This is not inherently SFIO specific. If we replaced SFIO with STDIO tomorrow all we would have to do is replace the sfsync(NULL) with fflush(NULL) in the sync.c module.

If you ignore the sync -f behavior there is zero justification for having this be a builtin unless you were to use ksh as an alternative to environments like busybox. The sync -f feature is the only thing that might justify its existence. And I'm not convinced it's really needed. When was the last time you wrote a ksh script where you found yourself saying "I wish I could force buffered output to be flushed to the file/pipe it is associated with?" I say file/pipe because if it's associated with a tty it's already unbuffered. In fact, AFAICT, output is already guaranteed to be flushed when a given builtin terminates. Is there actually a use case for this feature? Ignoring any bugs, of course, that would cause the output of a builtin not to be flushed when the builtin completes.

krader1961 commented 5 years ago

I'm more convinced now than before that the sync builtin should just be removed. It serves no useful purpose. Somebody simply said to themselves "I can add these features without much effort" without really having justification for the features in the first place. Like having gamma() be available in math expressions the sync behaviors were added "just because" someone decided the few minutes it took to write the code was a good reason to do so.

DavidMorano commented 5 years ago

@DavidMorano The point of sync -f is to give a ksh script the same functionality a C program has by calling fflush(NULL). This is not inherently SFIO specific. If we replaced SFIO with STDIO tomorrow all we would have to do is replace the sfsync(NULL) with fflush(NULL) in the sync.c module.

I am not at all against the '-f' option per se. I just want any reference to SFIO to be elided from any documentation to the user. I do not want new users to have to know anything about either SFIO or that SFIO is used internally within KSH. Someday, SFIO might be removed from KSH and I do not want users to think in terms of SFIO being present. I agree with you in general that a future KSH could use STDIO -- or even something else -- and the user documentation should just read something like "flush any outstanding buffered data."

If you ignore the sync -f behavior there is zero justification for having this be a builtin unless you were to use ksh as an alternative to environments like busybox. The sync -f feature is the only thing that might justify its existence.

I remember that someone or some group actually wanted |sync(1)| to also be a KSH built-in. I seem to remember that it was strongly requested. The rational was (if I remember) something along the lines of: a real |sync(1)| needs to be searched for and loaded as a program (and execed et cetera), where a built-in is already present. It might have been requested as part of a 'busybox' like scenario also, but I do not remember that myself. I do follow this logic myself though, as an administrative safety feature, when somehow |sync(1)| might not be readily available for some weirdo reason (out on a disconnected network or something else). I have worked on networked systems where the network was incredibly slow and having certain KSH built-ins served as a huge advantage in some of those cases. In fact, I wrote my own |sync(1ksh)| builtin before the AST people did exactly for some of these very reasons. I had to rename my own previous built-in |sync(1ksh)| to some other name so as not to conflict with the newly written AST version when the AST one eventually became available (I deferred the name to the new AST version).

And I'm not convinced it's really needed. When was the last time you wrote a ksh script where you found yourself saying "I wish I could force buffered output to be flushed to the file/pipe it is associated with?" I say file/pipe because if it's associated with a tty it's already unbuffered.

I agree, and this is why I also questioned the need for the '-f' option in my previous post above! The world seemed to work just fine for over 20 years or so (or maybe almost 25 years or so) before the AST version of |sync(1ksh)| was eventually written. And somehow the world got by without the need for the '-f' capability. I wanted a built-in |sync(1ksh)| myself in order to get the kernel |sync(2)| feature! That is what I wanted a |sync(1ksh)| for!

In fact, AFAICT, output is already guaranteed to be flushed when a given builtin terminates. Is there actually a use case for this feature? Ignoring any bugs, of course, that would cause the output of a builtin not to be flushed when the builtin completes.

I am not actually sure that KSH does a flush of buffered output every time a built-in returns to KSH after exiting (I would check the actual code to be sure). But KSH already does a flush of any buffered output before any new interactive input is needed. It has always done that as far as I know. A built-in itself does not leave output streams in a flushed state when they exit by default. An explicit |sfsync(3sfi)| has to be explicitly coded from within the built-in, it if wants that. For myself, I did always do an explicit (although hidden in my own buffered-stream-IO library package) |sfsync(3sfio)| before I exited the built-in, but this is not required behavior from built-ins per se. But AFAIK, KSH would do the flush itself if it wanted to get at any unflushed data that itself wanted to consume.

Also, related, most of the time when KSH wanted to be the source or recipient of data going into or coming out of a stream to or from a built-in, it would arrange to make the SFIO streams going into the built-in memory based SFIO streams. This was a particular and desired feature of KSH for speeding up I/O interaction with the built-ins. It is this very feature that forces the use of SFIO by built-ins in the first place (something I have complained about enough already on this forum)!

krader1961 commented 5 years ago

@DavidMorano Can you estimate how long ago that someone wanted sync to be a ksh builtin? And if it was all that important why other shells (e.g., bash and zsh) do not also provide sync as a builtin?

It has been ~20 years since being a UNIX sysadmin was my core job description. But even then I would not have expected the root shell of the systems I managed to provide sync as a builtin of the shell I was using. And today there is effectively no justification for the sync command.

DavidMorano commented 5 years ago

@krader1961

Can you estimate how long ago that someone wanted sync to be a ksh builtin?

Yes, the discussion took place over on either or both of the old AST mailing list and, I think but am not certain, some old Solaris mailing list (or one of several of them). I think that one or maybe two distros at the time wanted |sync(1)| to be made into a built-in. There is a chance it was Solaris itself, but I am not sure. It could have been someone else. The time frame was the latter mid-2000s (estimating between about 2005 to 2007). I assume that it was not RedHat or @siteshwar might have known about it or mentioned something by now already. There were several (many) years in the 2000s when Solaris and related people (I think people who used Solaris in various enterprise solutions) took a great interest in getting a modern version of KSH integrated into the Solaris environment. That was all accomplished by about 2007 or 2008 or so (after many years of planning and work, by the way, not a fast accomplishment). You can look at the OpenSolaris or Illumos sources and see how KSH (in the form of LIBKSH or LIBSHELL, along w/ LIBAST) got very integrated into Solaris user-land. Specifically, Solaris wanted to get rid of the old Bourne shell and replace it with a linked version of KSH (so SH is actually KSH on Solaris). In fact, Solaris standardized on using KSH for all of the system related shell uses, including all boot up.

They (the Solaris development people) actually re-wrote many (or maybe almost all) of their system-related shell programs to take advantage of the newer KSH features. All of those shell programs had previously been tied to the older Bourne shell capabilities only, so it was a significant upgrade in capabilities for Solaris at the time. Rewriting shell programs provided a significant speed-up for them, since many of the called (invoked) program names within shell programs, that were previously loaded as actual programs in older Bourne shell programs, now became builtins under KSH. So it was all a significant speed-up for them to go through all of the work that they did. Also, I remember that many of the Solaris developers were also learning how to use newer KSH features for the first time (there were many questions about how to use the newer KSH features on the various forums at the time).

I did follow the discussion and I remember that people wanted the KSH built-in |sync(1ksh)| to perform a kernel |sync(2)|. I do NOT remember any discussion at all about it supporting any other options, or any options at all. People just wanted a way to execute |sync(2)| without having to run the actual |sync(1)| program. This capability was seen more as an administrative safety measure of some kind, and not some big new feature of any sort. I also remember at least one and maybe two people actually being surprised that SYNC was not already a KSH builtin! In a sort of coincidence, I had previously written a SYNC KSH builtin myself (originally called just 'sync') some time earlier for the very same reason, to get |sync(2)| executed without having to actually load and run a |sync(1)| program. After the AST people (at least I think it was them) wrote their built-in SYNC, I renamed my previous version to some different name, since mine was not identical to the newer AST version. At this moment, I forget what I renamed my version to.

There was a relatively large number of people interested in and working on KSH related issues in the early and mid-2000s. I do not remember who they were all affiliated with, but most of them were related to Solaris though. So it is very likely that it was Solaris who wanted a built-in SYNC, but somehow I seem to remember it being a different group. But this is the end of my memory. I do not really remember who exactly it was who wanted it.

krader1961 commented 5 years ago

@DavidMorano Thank you for that information.

@siteshwar I'm still in favor of removing sync as a builtin. It is true that I have not been a sysadmin for more than three decades (excluding my personal servers). And it's been more than ~15 years since I did remote UNIX support for a vendor and thus cared about the ability to run commands like sync. So my viewpoint may not be representative. But I have a hard time believing that including a builtin sync for every single build of ksh can be justified. Is it really the case that sysadmins (or a vendor's UNIX support techs) still run sync before shutting down a system? Even if you exclude dinosaurs like myself which did so in the distant past when it really did serve a purpose and might be inclined to do so out of habit? If RedHat thinks having a builtin sync is still useful in 2019 then I agree that every single option it supports should be removed and the command stripped to nothing more than a CLI way to call the sync(2) syscall.

siteshwar commented 5 years ago

@kdudka What's your view about it ?

kdudka commented 5 years ago

I would also not expect sync to be a shell built-in. Does ksh really continue with execution of a shell script with unflushed stdio buffers? Do we have any example where unflushed stdio buffers of ksh would cause an observable problem which the sync built-in would fix? What would be the cost of making it work out of the box (without using the sync built-in)?

siteshwar commented 5 years ago

Does ksh really continue with execution of a shell script with unflushed stdio buffers?

Yes, it does. We have seen couple of bugs because of this (See issue #61 and #1069).

Do we have any example where unflushed stdio buffers of ksh would cause an observable problem which the sync built-in would fix?

Not at the moment.

What would be the cost of making it work out of the box (without using the sync built-in)?

We have to wait for bug reports that are caused by buffered I/O and fix them by syncing streams.

DavidMorano commented 5 years ago

Separation of concerns for SYNC as a built-in:

  1. the need to put it, |sync(1ksh)|, inside of shell program scripts in order to get flushed user data of some sort (SFIO streams)
  2. for use as an emergency |sync(2)|

As for 1 above, I think the idea is completely unwarranted! As I stated previously, the world worked completely fine without having to sync anything for at least 25 years! If some sort of data in a shell program script is not behaving properly, that is a bug of some sort. But |sync(1ksh)| is NOT the answer. Rather it is some bug that needs to be fixed to work as it did in the prior decades when no stupid-ass |sync(1ksh)| was ever needed, or even yet invented.

As for 2 above, it is an administrative emergency only, in my opinion. It is NOT needed for any sort of shutdown or anything of the sort. Shutdown can sync its own file-systems just fine -- thank you very much!

But I have seen the following many times in my life:

Now, can the administrator purposely crash the system and let the resulting panic SYNC the file-systems? Yes, that can happen and will happen (outstanding mounted file-systems will be attempted to be SYNCed). But being able to perform an extra |sync(2)| before the system might crash on its own is an extra administrative help. Yes, normally, the OS itself, now-a-days from a separate internal kernel thread or time-out call-back, but in the old days with something like the |update(1m)| daemon program or other, would eventually try to perform its own |sync(2)| like every 30 seconds or so (yes, you older guys may remember). And, yes, administrators could also wait for at least 30 seconds before initiating a system crash, just to get the benefit of that extra |sync2)|. So having |sync(1ksh)| (a built-in performing |sync(2)|) was handy as an administrative tool, nothing else (in my opinion).

If people get the idea that data in a shell program script needs to ever require a separate SYNC of some kind, then in my opinion the thought process has gone off of the rails (as it were). Yes administrative scripts can do a |sync(1)| if they want, but only in order to get access to |sync(2)|. The idea of SYNCing SFIO streams sounds ridiculous to me. How would anyone (at script user level) know that an SFIO stream needed a SYNC? The idea just does not relate to my brain. I am missing something there? So if someone thinks they need SYNC in order to get a shell script program to work properly, then I am in favor of removing |sync(1ksh)| completely -- in order to forestall that sort of thinking. |sync(2)| is a system administrative thing, not a user-correctness thing.

|fdatasync(3rt)| and its relatives, I do not know. But somehow sticking 'sync' calls into user shell script programs does not seem to be a good answer for anything (at least not to me).

Also remember that at the shell script programming level, there is not supposed to be any visible SFIO anything -- in my opinion. And if I had my way, no one would know that there are anything called file-descriptors (FDs) either!