ghaerr / elks

Embeddable Linux Kernel Subset - Linux for 8086
Other
1.02k stars 108 forks source link

Proposed changes to ls(1) #388

Closed Mellvik closed 4 years ago

Mellvik commented 4 years ago

I propose we change ls(1) back to the Unix tool philosophy:

Opinions?

--Mellvik

ghaerr commented 4 years ago

In general, I would say yes, (and also just eliminate 'l' altogether).

However, I want to bring up a new, I think big, problem that has already started, and will only get worse should you/we go ahead with improving our applications individually, which admittedly are terribly old and IMO pretty buggy. Right now, any changes we make to ls (including all recent fixes I've made to it), are NOT being made to all versions of it. That is, with the busyelks commits, despite the wonderful work that @marcin-laszewski has been doing, we now find ourselves with almost-duplicate source copies of most useful applications in the ELKS source tree. These source copies are not being maintained with bug fixes already being made, and each application 'fix' now has to be done twice. I can see this getting to be a big problem, and users will complain when their version of the command isn't working, when it was supposedly already fixed.

I'm not sure what to do about this problem. We could back-port the changes required to make our command-set back into their original source (most were modified with their main entry point changed to something like cmd_ls etc). Allowing near-duplicate source for major commands has introduced lots of extra maintenance forever.

An even bigger issues remains: our commands are very old, and lots of work, which has already been done by the Linux community, is ignored. It would take tons of work to bring these commands up to what's expected. I am wondering whether adding binutils-ia16 into ELKS might not be a better idea to improve all commands quickly. @tkchia, I notice you maintain a version of them, thoughts?

jbruchon commented 4 years ago

Perhaps busyelks was not the best idea. If the libc code could be shared between programs somehow, there would be no need for busyelks that I can think of.

ghaerr commented 4 years ago

My understanding is that busyelks creates symlinks for each of the applications, and uses argv[0] to determine what to do. Thus (on MINIX filesystems only), there is a large savings of disk space by using a busyelks approach rather than separate executables, regardless of libc sharing.

mfld-fr commented 4 years ago

For ls and many other commands, the source file can be the same between busyelks ls, standalone ls and sash ls. One common source file, 3 links in the source tree, and the entry point to change to main or cmd_ls, depending on the actual container.

tkchia commented 4 years ago

Hello @ghaerr,

I am wondering whether adding binutils-ia16 into ELKS might not be a better idea to improve all commands quickly. @tkchia, I notice you maintain a version of them, thoughts?

I guess you are referring to coreutils, as in GNU's version of basic programs like ls, head, etc.? (binutils-ia16 is actually the GNU assembler and linker and related stuff.)

It might be an interesting exercise to try porting coreutils to ELKS --- if someone else has not already done so --- though I suspect that some of the utilities would be a bit too bloated to fit onto an ELKS disk.

Thank you!

ghaerr commented 4 years ago

For ls and many other commands, the source file can be the same between busyelks ls, standalone ls and sash ls. One common source file, 3 links in the source tree, and the entry point to change to main or cmd_ls, depending on the actual container.

Good idea on using symlinks, but this won't work for sash ls or any other sash builtin, as that source is for a shell builtin and not compatible with standalone ls.

For busyelks, I agree, since it copied ls standalone source and then was ported into a shared structure. We should have considered this problem before accepting busyelks, as the problem now is that there's considerable work to be done to back-port each of the busyelks applications into the source from which there were directly copied.

ghaerr commented 4 years ago

It might be an interesting exercise to try porting coreutils to ELKS --- if someone else has not already done so --- though I suspect that some of the utilities would be a bit too bloated to fit onto an ELKS disk.

Yes, I was talking about coreutils. Agreed that likely some of them might be too bloated, but porting individual apps from that source, starting with say, its ls and other commands might be far simpler then trying to hand-craft each enhancement from scratch. (If someone wants to hand-enhance commands, I'm for it, its just slow-going process to get ELKS more modernized).

jbruchon commented 4 years ago

One of the objectives with busyelks was to also combine frequently used code fragments that are not covered by libc to further save space. Anything which does that will eliminate the ability to use the code as standalone executables, but I have an idea. The executables are already kind of sloppy. Why not rewrite the structure of busyelks such that the space-saving code sharing is done in a common set of files a la libbb and the necessary "library" fragments are linked to the standalone versions, and the same build system produces both standalone and combined executables, using some ifdef magic to convert cmd_xx to main as needed? That would consolidate all tools into busyelks, eliminate the duplication, and still allow production of standalone executables as needed.

ghaerr commented 4 years ago

Another idea would be to just delete the original versions of code now moved into busyelks, and have busyelks either produce separate executables (as is required for FAT anyways), or linked files. This solves the whole problem without symlinks or libraries. It does change the structure of elkscmd/* a bit, but moving out old code into a single directory.

With this suggestion, then, a further Config change would be needed, and selecting BusyELKS would build all commands currently in BusyELKS, and selecting DISK_UTILS or FILE_UTILS etc would only select those commands that are NOT included in busyELKS, since they would no longer exist there.

ghaerr commented 4 years ago

With the above idea, the Config option 'busyELKS' could be changed to say, CORE_APPLICATIONS, thus not being confusing with 'busyELKS or not'. If CORE_APPLICATIONS were selected, then on MINIX filesystems, a single binary with many symlinks would be created, thus accomplishing what busyELKS is intended for.

Thus, essentially the busyELKS contribution just becomes a better way to prepare a very-small-footprint ELKS disk with essential core applications. There's really no need to have it called busyELKS, nor to have options that include other applications of the same name.

Mellvik commented 4 years ago

Just a warning thrown into an interesting and useful discussion:

It may mot be wise to let the limitations of the FAT file system be our guidance here: Assuming the capabilities of a 'real' filesystem is a reasonable foundation for a good discussion - and then do whatever is necessary to handle 'the oddball'.

With all due respect to the efforts and creativity that now make ELKS bootable from FAT, I do have a hard time finding it useful. That said, the increasingly capable and stable FAT fs support (as a mounted fs) is very important. IMHO.

--Mellvik

  1. feb. 2020 kl. 17:47 skrev Gregory Haerr notifications@github.com:

 With the above idea, the Config option 'busyELKS' could be changed to say, CORE_APPLICATIONS, thus not being confusing with 'busyELKS or not'. If CORE_APPLICATIONS were selected, then on MINIX filesystems, a single binary with many symlinks would be created, thus accomplishing what busyELKS is intended for.

Thus, essentially the busyELKS contribution just becomes a better way to prepare a very-small-footprint ELKS disk with essential core applications. There's really no need to have it called busyELKS, nor to have options that include other applications of the same name.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ghaerr commented 4 years ago

It may mot be wise to let the limitations of the FAT file system be our guidance here: Assuming the capabilities of a 'real' filesystem is a reasonable foundation for a good discussion - and then do whatever is necessary to handle 'the oddball'.

Completely agree. What I was trying to say in the above comments was that removing the original (now extra) copies of the duplicated source code and having them all in busyELKS (renamed core_utils) could solve our problems if the following was also done: 1) By default, on MINIX filesystems, build the single master application binary and install it into elks/template, along with symlinks for each of the replaced applications, which would us a very small amount of disk space, but contain all the same functionality as before; and 2) For FAT filesystems, or MINIX where separate binaries are wanted, use a core_utils build option that installs each binary separately, without symlinks. @marcin-laszewski, what are your thoughts on this?

I don't see a good reason to keep the same applications (or source code) in multiple locations, as it introduces a maintenance headache. Building them in separate locations also introduces maintenance as well.

With all due respect to the efforts and creativity that now make ELKS bootable from FAT, I do have a hard time finding it useful.

There is still the case of distributing ELKS applications on FAT disks though, which was brought up previously as very desirable. The image doesn't have to be bootable, just the ability to copy and use applications on a widely supported filesystem.

That said, the increasingly capable and stable FAT fs support (as a mounted fs) is very important. IMHO.

Agreed!

Mellvik commented 4 years ago

Thanks @ghaerr for a thorough clarification - I understand now that we're entirely on the same page.

While we're on this issue, is there a good reason for using symlinks instead of hardlinks for coreutils? It occurs to me that hard links would be more efficient (save inodes, diskblocks and faster startup - saving dir lookups and reads, particularly on slow floppies)?

--Mellvik

  1. feb. 2020 kl. 00:44 skrev Gregory Haerr notifications@github.com:

 It may mot be wise to let the limitations of the FAT file system be our guidance here: Assuming the capabilities of a 'real' filesystem is a reasonable foundation for a good discussion - and then do whatever is necessary to handle 'the oddball'.

Completely agree. What I was trying to say in the above comments was that removing the original (now extra) copies of the duplicated source code and having them all in busyELKS (renamed core_utils) could solve our problems if the following was also done: 1) By default, on MINIX filesystems, build the single master application binary and install it into elks/template, along with symlinks for each of the replaced applications, which would us a very small amount of disk space, but contain all the same functionality as before; and 2) For FAT filesystems, or MINIX where separate binaries are wanted, use a core_utils build option that installs each binary separately, without symlinks. @marcin-laszewski, what are your thoughts on this?

I don't see a good reason to keep the same applications (or source code) in multiple locations, as it introduces a maintenance headache. Building them in separate locations also introduces maintenance as well.

With all due respect to the efforts and creativity that now make ELKS bootable from FAT, I do have a hard time finding it useful.

There is still the case of distributing ELKS applications on FAT disks though, which was brought up previously as very desirable. The image doesn't have to be bootable, just the ability to copy and use applications on a widely supported filesystem.

That said, the increasingly capable and stable FAT fs support (as a mounted fs) is very important. IMHO.

Agreed!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ghaerr commented 4 years ago

While we're on this issue, is there a good reason for using symlinks instead of hardlinks for coreutils? It occurs to me that hard links would be more efficient (save inodes, diskblocks and faster startup - saving dir lookups and reads, particularly on slow floppies)?

Good idea. I'll look into this - I think the previous install links were symlinks, and I just followed that convention when writing mfs. It will have to be updated to use hard links. I haven't looked to see what busyElks is using.

jbruchon commented 4 years ago

One reason for using symlinks instead of hard links is that if you replace a hard linked file without truncating and overwriting in place, you generate a new inode and the hard linked files are not updated. Symlinks don't suffer from this undesirable artifact. If you add hard linking in place of symlinks. I expect it to be optional and to have symlinks be the default.

tkchia commented 4 years ago

Hello all,

Well, I am not seeing any downsides to switching to busyELKS all the way.

2) For FAT filesystems, or MINIX where separate binaries are wanted, use a core_utils build option that installs each binary separately, without symlinks.

On FAT, I think one other possibility is to do what DJGPP does: create small stub programs such as /bin/ls, /bin/cat, etc. that simply hand over to the busyELKS binary. (When the ELKS kernel supports some form of symlink emulation for FAT, we can then drop the stubs and "really" 😉 use symlinks.)

This will mean we can build the core utilities in much the same way, for both the Minix and FAT filesystems.

Thank you!

ghaerr commented 4 years ago

Hello @tkchia,

Well, I am not seeing any downsides to switching to busyELKS all the way.

I agree that it seems that busyELKS has big advantages, but there is a small downside/tradeoff, which could be a bigger deal on slow floppy systems:

Each invocation of a core utility would run a (sym or hard linked) much larger executable (that is, the size of all of them combined). Thus, whereas say, ls used to be quite small, it is now much larger. So the real tradeoff of a busyELKS approach is less disk space used but higher runtime access costs. The memory usage with busyELKS would be likely improved when piped or multiple commands because of the shared text segment.

Switching fully would allow the original version of each command included in busyELKS to be removed, thus cleanly solving our application duplicate source code maintenance problem. When core_utils/busyELKS is selected in config, all the core utilities would be installed in a very minimal space, and if the the other categories were selected, they would just add the additional utilities from each selected category.

  1. For FAT filesystems, or MINIX where separate binaries are wanted, use a core_utils build option that installs each binary separately, without symlinks.

On FAT, I think one other possibility is to do what DJGPP does: create small stub programs such as /bin/ls, /bin/cat, etc. that simply hand over to the busyELKS binary. (When the ELKS kernel supports some form of symlink emulation for FAT, we can then drop the stubs and "really" 😉 use symlinks.)

Nice idea, but I fear that on older PC's with floppies, this will almost double the load time for every command by having to load and execute two binaries each time. Using symlinks instead of hard links will also increase the load time on MINIX filesystems as well.

This will mean we can build the core utilities in much the same way, for both the Minix and FAT filesystems.

If the core utilities are built the same way for FAT or any filesystem that does not support sym or hard links, and do not use your above stub method, then the space used on each floppy for each of the commands will be excessive. So we likely need to decide which tradeoff is preferred - using a stub as described above, or building smaller versions for each command. It might be beneficial to have the ability to build smaller versions as an option anyways for other uses. (This is what happens now, except that the source code is duplicated, which I think is a problem).

Mellvik commented 4 years ago

@ghaerr, This is an excellent summary of considerations re. Busyelks/coreutils vs individual commands. I'm in the middle of comparing the two command by command in order to have some empirical numbers to throw around (source size, code size ...), but the main conclusion is a given: The busybox concept works for flash/ssd only. Ok for ROM systems with flash storage, unusable for vintage HW with floppies, even hard disks. The busybox image is larger than the buffer cache, and the performance is unbearable.

So - if we want newbees to become interested in ELKS, don't give then a floppy image with busyelks - regardless of FS type. The rest of us, we can configure whatever we'd like. For emulators, nothing really matters, it's fast anyway. For my 286 klunkers and floppies, I just love the feeling when the second ls() executes immediately ...

Further - IMHO - this should end the discussion about links on FAT. And yes, let's get rid of the command duplicates (just hang on till I've completed the comparison), and have the same code base for busyelks and individual cmds.

--- Mellvik

  1. feb. 2020 kl. 17:25 skrev Gregory Haerr notifications@github.com:

 Hello @tkchia,

Well, I am not seeing any downsides to switching to busyELKS all the way.

I agree that it seems that busyELKS has big advantages, but there is a small downside/tradeoff, which could be a bigger deal on slow floppy systems:

Each invocation of a core utility would run a (sym or hard linked) much larger executable (that is, the size of all of them combined). Thus, whereas say, ls used to be quite small, it is now much larger. So the real tradeoff of a busyELKS approach is less disk space used but higher runtime access costs. The memory usage with busyELKS would be likely improved when piped or multiple commands because of the shared text segment.

Switching fully would allow the original version of each command included in busyELKS to be removed, thus cleanly solving our application duplicate source code maintenance problem. When core_utils/busyELKS is selected in config, all the core utilities would be installed in a very minimal space, and if the the other categories were selected, they would just add the additional utilities from each selected category.

For FAT filesystems, or MINIX where separate binaries are wanted, use a core_utils build option that installs each binary separately, without symlinks. On FAT, I think one other possibility is to do what DJGPP does: create small stub programs such as /bin/ls, /bin/cat, etc. that simply hand over to the busyELKS binary. (When the ELKS kernel supports some form of symlink emulation for FAT, we can then drop the stubs and "really" 😉 use symlinks.)

Nice idea, but I fear that on older PC's with floppies, this will almost double the load time for every command by having to load and execute two binaries each time. Using symlinks instead of hard links will also increase the load time on MINIX filesystems as well.

This will mean we can build the core utilities in much the same way, for both the Minix and FAT filesystems.

If the core utilities are built the same way for FAT or any filesystem that does not support sym or hard links, and do not use your above stub method, then the space used on each floppy for each of the commands will be excessive. So we likely need to decide which tradeoff is preferred - using a stub as described above, or building smaller versions for each command. It might be beneficial to have the ability to build smaller versions as an option anyways for other uses. (This is what happens now, except that the source code is duplicated, which I think is a problem).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

marcin-laszewski commented 4 years ago

symlinks and FAT:

  1. UMSDOS
  2. Short programs or scripts called busyelks, eg. /bin/cut: ......................................

    !/bin/sh

    exec busyelks cut "$@" ......................................

or even: exec busyelks "$0" "$@" exec busyelks "basename "$0"" "$@" etc.....

ghaerr commented 4 years ago

but the main conclusion is a given: The busybox concept works for flash/ssd only. Ok for ROM systems with flash storage, unusable for vintage HW with floppies, even hard disks. The busybox image is larger than the buffer cache, and the performance is unbearable.

Thanks for your comments, and I want to see your analysis of the command sizes, but hold on a second! There still is definitely utility in having a busyELKS for systems with small disks on MINIX - that allows the entire coreutils set of applications to be placed in a single (~48k) executable, and with hard links for all the other commands, no additional disk usage. I guess you're saying that the application file size of 48k is still just too large for older floppy systems, even though it allows you to distribute a very small disk image?

Are you testing it now on your old hardware? I have not yet been able to get it to build a proper image yet on OSX, but @marcin-laszewski is helping me.

ghaerr commented 4 years ago
  1. UMSDOS

We're running an early implementation of UMSDOS as our FAT filesystem. Do later versions implement sym link support? That would be interesting to look at.

  1. Short programs or scripts called busyelks, eg. /bin/cut: ......................................

    !/bin/sh

    exec busyelks cut "$@" ......................................

or even: exec busyelks "$0" "$@" exec busyelks "basename "$0"" "$@"

Nice idea on fast systems, but running yet another executable (cut or basename) just to get the arguments for another invocation of busyelks will be very slow on floppies. ELKS kernel would skip reading the code segment again if already loaded, but this wouldn't likely be the case with just the shell running.

jbruchon commented 4 years ago

As far as I am aware, there is no implementation of FAT that supports symlinks, unless you call Windows shortcuts and OS X aliases "symlinks."

As for busyelks having a huge binary size, how about implementing a block cache in the kernel? For most scripting operations, the loading cost would drop massively. It would complicate memory allocation a bit, of course. There is also the option of adding extended memory support and storing stuff like a block cache in that memory to keep the bottom 640K free.

ghaerr commented 4 years ago

As for busyelks having a huge binary size, how about implementing a block cache in the kernel? For most scripting operations, the loading cost would drop massively.

There already is one - the L1 buffer cache. Thus @Mellvik's comment about the busyelks executable being larger than the entire buffer cache being a problem. In addition, reading each symlink replaces another 1k buffer cache with the resultant floppy read required again.

It would complicate memory allocation a bit, of course. There is also the option of adding extended memory support and storing stuff like a block cache in that memory to keep the bottom 640K free.

All this for just for busyELKS? There is also an external buffer system, but I think @Mellvik's point is that the increased size is larger than the non-busyELKS command size by a lot, and larger than available buffers, thus slowing down overall ELKS speed of use, and all this is before any other data that gets run through the system buffers.

ghaerr commented 4 years ago

Hello @mfld-fr, (from your comment on #404):

I am wondering if such caching would be able to handle the 'big' BusyELKS executable...

I have been looking at our elkscmds/ utilities extensively this weekend, and have come to the conclusion that ELKS commands do have something valuable - very small-footprint versions of useful commands. I am beginning to think that busyELKS might want to stay experimental, just for those that want it, and instead enhance and debug the existing utilities. They will be very fast with floppy users and work well with the disk cache. The large executable combined with the need for symlinks, and then the additional problems not yet solved with FAT symlinks, lead me to this conclusion. Many of the current command bugs and complaints, including this issue from @Mellvik, are actually quickly fixed. Also, moving to later versions of GNU coreutils will surely increase the file sizes of each command.

That said, there have been some nice enhancements @marcin-laszewski has made in busyELKS source for commands, which might want to be kept in the original versions. I need to do a size study to really know. These include moving from write(2,.. to fprintf(stderr,... (I would like to create a special 'extremely-small' version of 'vfprintf' that doesn't include every format specifier to keep things quite small for these small commands). There are other items of interest as well in his work. His libc changes already decrease the size of the original commands.

Mellvik commented 4 years ago

Great update, @ghaerr, and FWIW I completely agree. Unless there is a rush, I'm going to do a complete comparison of the binary sizes in about a week.

-- Mellvik

  1. mar. 2020 kl. 17:11 skrev Gregory Haerr notifications@github.com:

 Hello @mfld-fr, (from your comment on #404):

I am wondering if such caching would be able to handle the 'big' BusyELKS executable...

I have been looking at our elkscmds/ utilities extensively this weekend, and have come to the conclusion that ELKS commands do have something valuable - very small-footprint versions of useful commands. I am beginning to think that busyELKS might want to stay experimental, just for those that want it, and instead enhance and debug the existing utilities. They will be very fast with floppy users and work well with the disk cache. The large executable combined with the need for symlinks, and then the additional problems not yet solved with FAT symlinks, lead me to this conclusion. Many of the current command bugs and complaints, including this issue from @Mellvik, are actually quickly fixed. Also, moving to later versions of GNU coreutils will surely increase the file sizes of each command.

That said, there have been some nice enhancements @marcin-laszewski has made in busyELKS source for commands, which might want to be kept in the original versions. I need to do a size study to really know. These include moving from write(2,.. to fprintf(stderr,... (I would like to create a special 'extremely-small' version of 'vfprintf' that doesn't include every format specifier to keep things quite small for these small commands). There are other items of interest as well in his work. His libc changes already decrease the size of the original commands.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ghaerr commented 4 years ago

Hello all,

I thought of yet another approach to the issue of 'busyELKS' vs 'individual small commands' this morning. Since @marcin-laszewski has shown that all of 'coreutils' can be compiled (including the shell) into a single binary and fit into ELKS 64k/64k max program size, why not just have all of these commands be shell internal commands? That solves all of the mentioned problems and doesn't require buffers, symlinks or hard links and works on MINIX and FAT identically. Since the shell is already loaded and running, when the user types a command, no disk access at all and it runs! If a builtin 'lsbin' were included, then users could see the commands available that were missing from /bin. The shell would be larger, but if loaded again to run a shell script, the entire code segment is already skipped by ELKS exec so loading would be as fast as today.

Should this idea get any following, I have the idea of 'source applets' that would take each of these 'coreutil' commands and allow them to be processed before compilation, such that the same 'applet' could be included in busyELKS, for those that want that, could be compiled into standalone utilities, or included in the shell, all with the same source and capabilities. There would be just one source version of the applet.

ghaerr commented 4 years ago

Further inspection on the business of shells available and commands shows the following:

Config Option           Directory           Comments
----------------------  --------------      ------------------------------------------------------------
CONFIG_APP_ASH          ash/                Default shell, installs as /bin/sh. BASH compatible.
                                            Std builtins: cd, chdir, '.', source, echo, eval, exec, exit,
                                                export, readonly, getops, hash, jobid, jobs read
                                                set, setvar, shift, trap, ':', true, false, umask, unset, wait
                                            Scripting builtins: break, continue, return
                                            No non-std builtins.

CONFIG_APP_SASH         sash/               Not selectable when ash selected. Installs as /bin/[sh,sash]. 
                                            This is not a bash shell and does not perform std scripting, except
                                            that it has some std and large number of other builtins.
                                            Std builtins: alias, unalias, cd, echo, exec, exit, source, setenv,
                                                printenv, prompt, pwd, umask
                                            Builtins: help, mkdir, rmdir, mknod, sync, rm, rmdir, chown, chgrp,
                                                cp, touch, mv, ln, mount, umount, more, kill, quit,
                                                dd, ed, grep, ls, tar

Thus, ELKS already has what I proposed in my last comment - sash has the ability to run a decent set of commands without having to access disk, in a small amount of space. However, sash isn't a BASH shell and won't run standard shell scripts. Note that sash has builtins for most of the commands we're discussing as needed in 'coreutils'.

Thus, there are actually three different source 'applets' for a possible coreutils within ELKS at the moment - all different. I will be producing a more lengthy report showing all the commands in busyELKS/coreutils, and each of the commands in each of the many subdirectories in elkscmd/, so that the details will be apparent for those interested.

A possible good solution for MINIX and FAT, keeping disk usage to a minimum as well as not using kernel buffers would be: Have config by default install both shells - install sash by default as /bin/sash, install ash as /bin/sh, and modify init/login/inittab to run /bin/sash if present, otherwise /bin/sh. If only one shell were selected in config, then it would be installed as /bin/sh by default. sash could be modified to exec /bin/sh (ash) for shell scripts, achieving BASH shell script compatibility but having a large number of useful builtins for the default shell when ELKS boots. We could then work on making the 'applets' compatible so that any of the three 'coreutils' applets would produce the same result, regardless of configuration.

[EDIT: Actually, installing sash as /bin/sash and leaving ash installed as /bin/sh, was meant, corrected above for this.]