ghaerr / elks

Embeddable Linux Kernel Subset - Linux for 8086
Other
1k stars 108 forks source link

basename() and dirname() in libc? #587

Closed cjsthompson closed 4 years ago

cjsthompson commented 4 years ago

While working on 'more' today I had to use basename(). Linux has both basename and dirname in glibc (#include ). A quick grep in ELKS source code shows a bunch of tools all have their own basename. Maybe this could be in ELKS' libc too?

ghaerr commented 4 years ago

These can be easily added. Do you have some known working code for each that can be used?

cjsthompson commented 4 years ago

For now I copied basename() from the sh_utils/basename.c utility in it and so far it works for my use case. There is also a function strip_trailing_slashes that is identical in both basename.c and dirname.c. This does what POSIX requires according to basename(3) in Linux. The basename and dirname commands in ELKS with these functions seem to work correctly except for 'basename /' returning nothing instead of /. When I'm done with 'more' I can give it a shot to fix this and add it to libc.

Screenshot_2020-04-26_02-01-27 Screenshot_2020-04-26_02-02-45

ghaerr commented 4 years ago

The basename and dirname commands in ELKS with these functions seem to work correctly except for 'basename /' returning nothing instead of /.

I'm not sure whether basename(1) behaves exactly the same as basename(3).

There could be new issues with functions named the same operating differently, but at least for the existing ELKS applications with their own copy, the application copy will override any function added to libc.

When I'm done with 'more' I can give it a shot to fix this and add it to libc.

Ok. Because of the issue(s) you've brought up above, the PR should just add the functions to libc, along with libgen.h, and just leave the existing functions in their respective applications, unless you have tested them. And go ahead and prototype them using ANSI C, as the plan is to slowly convert all older '_P()' etc. function definitions in header files to ANSI C.

cjsthompson commented 4 years ago

And go ahead and prototype them using ANSI C, as the plan is to slowly convert all older '_P()' etc. function definitions in header files to ANSI C.

Good to know. I was wondering about that.

cjsthompson commented 4 years ago

After further investigation, it turns out there are two sets of dirname/basename functions in Linux.

One set is POSIX, modifies the path and only works with null terminated strings or crashes. This set is included with libgen.h

The other set is the GNU versions. They are included by string.h with #define _GNU_SOURCE. These do the strdup() for you and don't touch the strings you pass to them.

Do we want the POSIX ones? The GNU ones or both?

ghaerr commented 4 years ago

Do we want the POSIX ones? The GNU ones or both?

Definitely not both :)

This is primarily for your use, but I would venture a guess that the simpler POSIX versions without requiring strdup/memory allocations would be fine for ELKS.

cjsthompson commented 4 years ago

I'm done with the POSIX functions for ELKS.

For my use the POSIX ones are fine. I have made some small regression test programs to test my implementation of basename(3) and dirname(3) for ELKS' libc based on the already existing ELKS basename(1) and dirname(1) and I also tested the GNU basename(3) and the results it gives are in fact quite different from the POSIX version:

Screenshot_2020-04-26_17-42-19

If we want compability with Linux's basename(1) which comes with GNU coreutils — and therefore I suspect uses GNU's basename(3) — also implementing GNU's basename(3) may be useful.

If it doesn't matter, then I can either do a pull request with what I have now or if you would prefer I can upload a tarball somewhere of what I did including my regression test programs so people here can check it out before I go ahead and make a pull request.

ghaerr commented 4 years ago

I also tested the GNU basename(3) and the results it gives are in fact quite different from the POSIX version:

It appears that the GNU and POSIX tests aren't the same... I don't see any POSIX tests with trailing slashes. ?

With regards to basename/dirname(1) on Linux vs ELKS: I'm pretty sure that no one has used basename for anything important on ELKS (yet). So these commands should produce something that is useful for the ELKS ash shell environment where they may be used in shell scripts. I'm not sure the corner cases matter as much as compatibility with ELKS /bin/sh in writing shell scripts.

also implementing GNU's basename(3) may be useful.

Sadly, it seems its a bit of a mess with competing versions of basename(3) and dirname(3). The ELKS utilities that need them already have their own copies. We definitely don't want to get ELKS into worrying about two different versions in the same library, so I'm not really sure which is the way to go for libc at this point.

I can't see the results of POSIX basename(3) with trailing slashes, but if those produce reasonable (non-empty string results), and are used in the revised implementations of basename/dirname(1), I would say lets go POSIX. Then we would be producing libc (3) versions that match command(1) operation.

If it doesn't matter, then I can either do a pull request with what I have now

Check results of trailing slashes for reasonableness, and post what you've got so we can have a look at it.

cjsthompson commented 4 years ago

It appears that the GNU and POSIX tests aren't the same... I don't see any POSIX tests with trailing slashes. ?

They actually are. The difference is that GNU doesn't modify the argument while POSIX does. So when I printf("%s %s", path, basename(path)) GNU's aren't modified while POSIX ones are (trailing slashes removed by chucking a NULL in the argument) because I believe printf resolves it's arguments from the last one to the first. This behavior is documented in the Linux man page. So I made sure the ELKS version behaved exactly the same.

cjsthompson commented 4 years ago

I've uploaded the code with my test programs there so you can have a look at it:

https://github.com/cjsthompson/workinprogress/tree/master/libgen

ghaerr commented 4 years ago

Overall, nice code. A few comments, and then how to move the code into elkscmd and libc.

It appears that basename(3) is buggy, when returning the address of 'static char def'. While that is a 'char ', it can't be guaranteed that the returned string is NUL ('\0') terminated. Thus, the declaration should be changed to 'static char def = "."', and 'return def;' later.

libgen.c: move the functions into their own files, same name as function, and have each include 'libgen.h'. Lets put these in libc/misc/ for now, rather than creating a separate directory. Modify misc/Makefile for them.

gen.h: it appears that 'basename.c' is including the Linux libgen.h, not your gen.h. gen.h should be renamed libgen.h and put in libc/include. Remove the commented-out header protection.

dirname.c and pathname.c(1) commands: replace the existing programs entirely in elkscmd/.

When making changes to libc, recompilation is tricky. You must always do a "make clean" in the libc/ directory, then make, in order to have elkscmd/ (compiled later) use them. Of course, you can always do a "make clean; make" from elks root but that takes a while.

After these changes, submit a PR, and we should be good to go. Thanks!

cjsthompson commented 4 years ago

It appears that basename(3) is buggy, when returning the address of 'static char def'. While that is a 'char ', it can't be guaranteed that the returned string is NUL ('\0') terminated. Thus, the declaration should be changed to 'static char def = "."', and 'return def;' later.

I will fix it.

libgen.c: move the functions into their own files, same name as function, and have each include 'libgen.h'. Lets put these in libc/misc/ for now, rather than creating a separate directory. Modify misc/Makefile for them.

Ok.

gen.h: it appears that 'basename.c' is including the Linux libgen.h, not your gen.h. gen.h should be renamed libgen.h and put in libc/include. Remove the commented-out header protection.

Yeah I did that on purpose because I wanted it to work and be compiled identically on Linux or ELKS. Because it's easier for coding than regenerating the ELKS image and rebooting it in the emulator all the time. 'gen.h' was a temporary thing to be sure it's not including Linux's own libgen.h. Not sure how <> and "" for includes behaves so I did that as quick fix to be 100% sure.

And I'm going to test my commit in ELKS too of course.

dirname.c and pathname.c(1) commands: replace the existing programs entirely in elkscmd/.

That's what I was planning to do. And also I noticed the basename and dirname in busyelks are basically copies of these, so it's safe to update them too if there are any plans to use it.

When making changes to libc, recompilation is tricky. You must always do a "make clean" in the libc/ directory, then make, in order to have elkscmd/ (compiled later) use them. Of course, you can always do a "make clean; make" from elks root but that takes a while.

Thanks for the tip. I'd rather not recompile everything all the time. Even on my 8 core bulldozer AMD, ELKS even without the cross tools is not an instant compile.

After these changes, submit a PR, and we should be good to go. Thanks!

Thanks for reviewing. Nice catch with the bug in basename. That's what I was most worried about.

ghaerr commented 4 years ago

Not sure how <> and "" for includes behaves so I did that as quick fix to be 100% sure.

Completely agree. I've found one cannot depend on the differences <> and "" , in general.

And also I noticed the basename and dirname in busyelks are basically copies

The busyelks copies of everything are a growing maintenance problem, for sure. There are now three copies of certain commands in file_utils/sash/busyelks. In general, a contribution happens with duplicated source code, then code rot sets in with updates occurring elsewhere, but not in duplicate copies. I've sorted out the file_utils/sash duplications and was ready to deal with them in the 'core_utils' project, but there are disk size increases if they are combined. You're welcome to change the busyelks/ source as well for your PR, but the busyelks source is way behind at this point, and I am unsure of its continued maintenance status.

I'd rather not recompile everything all the time.

I build little shell scripts per project that assist in automatic proper compilation to keep progress at a maximum.

cjsthompson commented 4 years ago

The busyelks copies of everything are a growing maintenance problem, for sure. There are now three copies of certain commands in file_utils/sash/busyelks. In general, a contribution happens with duplicated source code, then code rot sets in with updates occurring elsewhere, but not in duplicate copies. I've sorted out the file_utils/sash duplications and was ready to deal with them in the 'core_utils' project, but there are disk size increases if they are combined. You're welcome to change the busyelks/ source as well for your PR, but the busyelks source is way behind at this point, and I am unsure of its continued maintenance status.

Maybe some scheme like this where busyelks and sash would use the source code of the independent commands directly would fix this maintenance burden? Some tools that are using exit() all over the place are going to have to be fixed for sure.

ifdef BUSYELKS || SASH

int cmd_basename(int argc, char *argv[])

else

int main(int argc, char *argv[])

endif

{ ... }

If you'd like I can take that up as my next mini project once I'm done with 'more'. I do understand that this has to work on a 360k disk.

ghaerr commented 4 years ago

Maybe some scheme like this where busyelks and sash would use the source code of the independent commands directly

There are better ways to handle this than #ifdefing code, an outside tool would be a better approach, IMO.

My approach was moving towards an 'applet' source definition that would allow completely non-#ifdef'd source code to be loadable as applets in sash or executable "standalone".

If you'd like I can take that up as my next mini project

I have analyzed this extensively, and the bigger problem is that any approach will increase the size of basic commands on disk, by quite a bit if 'printf' is used instead of 'write'. Combine this with sash already quite functional requiring no disk I/O for the file_utils commands, and the disk images for standalone commands being as small as possible, and I've come to the conclusion that the project is just not worth it.

So I would suggest we find you a more useful project as your next one. There are many! :)

cjsthompson commented 4 years ago

There are better ways to handle this than #ifdefing code, an outside tool would be a better approach, IMO.

My approach was moving towards an 'applet' source definition that would allow completely non-#ifdef'd source code to be loadable as applets in sash or executable "standalone".

That sounds even better.

I have analyzed this extensively, and the bigger problem is that any approach will increase the size of basic commands on disk, by quite a bit if 'printf' is used instead of 'write'. Combine this with sash already quite functional requiring no disk I/O for the file_utils commands, and the disk images for standalone commands being as small as possible, and I've come to the conclusion that the project is just not worth it.

Another approach I've been thinking about is do two versions of the 360k disk. One with a sash with the bare minimum commands that are required for installing ELKS to a hard disk (mkfs, fdisk, some tool to update the mbr). It would clone itself to the hard drive like DOS, and when you reboot from the hard drive it then asks you for more disks to complete the install. The other 360k disk would be more oriented rescue disk/small system.

DOS-like OSes could get around the limitation of 360k because you can shuffle floppy disks and DOS doesn't care. Not sure how a UNIX would like that. Though I think Slackware could do that, but I could be wrong.

Minix 1 loads the kernel off one floppy and asks you for the root floppy when the kernel is loaded. Slackware back when it could be installed from floppy disks did that too to save space.

So I would suggest we find you a more useful project as your next one. There are many! :)

For sure!

ghaerr commented 4 years ago

Another approach I've been thinking about is do two versions of the 360k disk.

I'm all for possible disk images that might get more people using ELKS. ELKS needs a larger community. The first issue is that disk images need to be downloadable from either the (unmaintained) ELKS website, and/or certainly from Github. I'm not certain what the best distribution method would be, using CI, or just posting binaries in the ELKS tree for download off the main page.

One with a sash with the bare minimum commands that are required for installing ELKS to a hard disk (mkfs, fdisk, some tool to update the mbr).

You might try playing with mkfs. I enhanced mfs to use for ELKS on the host, but have never tested the 16-bit version. @pawosm-arm tested fsck in #464 and found it doesn't even work.

fdisk was recently rewritten and works. It could be used with command line additions to update an MBR.

The problem still remains of how to insert more floppies in a running kernel. There is some old code in elks/fs/super.c that allows one to "Insert root floppy to continue" when initially mounting root... this could be changed to work on any umount/mount, rather than the boot root mount. Goo d luck testing this in a emulator though!

Different disk images are easily created by passing a different package file than 'elkscmd/Applications' in a slightly modified elkscmd/Make.install. It might be easier to start with getting the existing extra images posted to the website or GitHub, then getting a few different disk images created that would interest more potential evaluation of what we're doing in ELKS.

cjsthompson commented 4 years ago

The first issue is that disk images need to be downloadable from either the (unmaintained) ELKS website

Does anyone still active with the project have access to it?

The problem still remains of how to insert more floppies in a running kernel. There is some old code in elks/fs/super.c that allows one to "Insert root floppy to continue" when initially mounting root... this could be changed to work on any umount/mount, rather than the boot root mount. Goo d luck testing this in a emulator though!

Now that I think about it, the answer to how Slackware was doing it is obvious. They must have been loading the root disk into a RAM disk and then chroot, which ELKS can't do because of the maximum 640k RAM (and often less).

Different disk images are easily created by passing a different package file than 'elkscmd/Applications' in a slightly modified elkscmd/Make.install. It might be easier to start with getting the existing extra images posted to the website or GitHub, then getting a few different disk images created that would interest more potential evaluation of what we're doing in ELKS.

Some means to download ready made images (releases and nightlies) would do a lot of good indeed.

ghaerr commented 4 years ago

Does anyone still active with the project have access to it?

Yes, the owner of this project does, and maintained it for a while. He'd likely be happy to see it updated.

Some means to download ready made images (releases and nightlies) would do a lot of good

I don't know enough about how CI works to post nightlies, but could certainly post binaries in a top level directory on GitHub. Its about time for a 0.4.0 release anyways.

ghaerr commented 4 years ago

Hello @cjsthompson,

While working on 'more' today I had to use basename()

Are you still working on 'more'? I was using more the other day and was thinking how nice it would be to have one that backs up... :)

cjsthompson commented 4 years ago

I have stalled at the moment on this because I need to set up FreeBSD to move forward with both the VMIN/VTIME problem and I'd also like to add optional termcap support (which no longer works as it should on Linux either). There is also the issue that my Linux distro upgraded to gcc 10.1, and with recent changes that require a cross tools rebuild I am no longer able to build binutils-ia16. So I'm doing something else until this problem is solved. I've filed a bug for binutils-ia16.

If you wish I can upload what I have so far to my github account. It already works better than the current more.

One that backs up more than one page seems tricky to do without malloc() because some characters may print nothing to the screen. Even util-linux's more can't do that. One that backs up would be 'less' and no longer 'more'.

ghaerr commented 4 years ago

I need to set up FreeBSD to move forward with both the VMIN/VTIME problem

Is that because you're writing/testing more on FreeBSD? I'm pretty sure we have VMIN/VTIME working properly for reading ESC [ ... keys; let me know if you need help.

with recent changes that require a cross tools rebuild I am no longer able to build binutils-ia16 I've filed a bug for binutils-ia16.

Which repository do you have that bug report in on? I don't see it on @tkchia's page, he's maintaining the ELKS toolchain.

One that backs up more than one page seems tricky to do without malloc()

Probably true for piped input. But when reading a file, it would see that the file offset could be recorded per page or line, and used for going backwards... ?

cjsthompson commented 4 years ago

Is that because you're writing/testing more on FreeBSD? I'm pretty sure we have VMIN/VTIME working properly for reading ESC [ ... keys; let me know if you need help.

I did it on Linux while grepping through ELKS source to make sure I was not using anything (functions, definitions...) that isn't available in ELKS because it's just much easier to iterate there than regenerating and running again an ELKS image after every change.

Which repository do you have that bug report in on? I don't see it on @tkchia's page, he's maintaining the ELKS toolchain.

https://github.com/tkchia/binutils-ia16/issues/7

Probably true for piped input. But when reading a file, it would see that the file offset could be recorded per page or line, and used for going backwards... ?

Yes, I thought about that and it's easy to just save how much bytes were read to display one paging of text the last time and rewind the stream to that to go back one screenfull of text. That was one thing I was planning to add because it's pretty easy to do and useful. It's also straightforward to rewind to the beginning of the stream.

Going to the last page seems tricky because of that problem of 1 character != 1 displayed character in addition to control characters that could be in the text file and be 2 characters == 1 displayed character or 0.

But as far as saving the file offset of each screenfull of text for a file of arbitrary length, I can't see how that can be done without allocating some memory to save the file offsets for the screenfuls of text because of that same problem of 1 character != 1 displayed character and so on. If 1 character == 1 displayed character, it would be real easy as you'd just get the file length and divide that in screenfulls of text to display and you can calculate the offsets.

Mellvik commented 4 years ago

@ghaerr, you're in good company missing reverse movement in more.

I don't have a fix,but when the need was really bad a few weeks back, I added a rewind function (command '1G' as in vi) and renamed more to less locally to keep them apart. Just an lseek() and a few lines of code. Far from the real thing but really helped me. If deemed useful, I'll put it in a PR.

--M

  1. jun. 2020 kl. 21:48 skrev Christoph notifications@github.com:  I have stalled at the moment on this because I need to set up FreeBSD to move forward with both the VMIN/VTIME problem and I'd also like to add optional termcap support (which no longer works as it should on Linux either). There is also the issue that my Linux distro upgraded to gcc 10.1, and with recent changes that require a cross tools rebuild I am no longer able to build binutils-ia16. So I'm doing something else until this problem is solved. I've filed a bug for binutils-ia16.

If you wish I can upload what I have so far to my github account. It already works better than the current more.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.