Projeto-Pindorama / heirloom-ng

A collection of standard Unix utilities that is intended to provide maximum compatibility with traditional Unix while incorporating additional features necessary today.
http://heirloom-ng.pindorama.dob.jp
Other
24 stars 7 forks source link

Idea/suggestion: Import future patches/features from Illumos/BSDs? #27

Open mamccollum opened 1 year ago

mamccollum commented 1 year ago

Hey there, I had an idea I wanted to share. I understand that future contributions are supposed to be in the Zlib License, & I know that Illumos & the BSDs are NOT under the Zlib License, however I was wondering if on new commands or commands already licensed under CDDL or existing compatible licenses, that we could potentially import commands, features, & more from Illumos and/or the BSDs such as OpenBSD.

Does anyone else believe this has potential? I could work on ensuring it uses libcommon, etc. I understand that we already have some work to do with OpenBSD compatibility, but I believe this could help push some innovation and expansion of the project, albeit at the cost of maintaining more.

Thoughts? (PS: if we do go through with this, should we try to import the git history for the specific files from Illumos, etc.?)

takusuman commented 1 year ago

With exception with importing directly from Illumos, I think it could be a good idea to implement/port missing commands. I mean, my idea is to try to make this package not-so-CDDL'd, since it could cause annoyance later on to folks trying to use this in an embedded environment. I'm on my way to port write(1) from UNIX v7, so it will pair up with mesg(1), but I haven't done a lot of work on it yet. Don't misunderstand me, please, but I think it would be more helpful (from a user point of view) to have default utilities with the new POSIX 2008 standard implemented. We have started doing it when we fixed that bug in default rm. But, anyway, what tools do you propose to port? Maybe there's something that I'm missing out and that could make a difference.

takusuman commented 1 year ago

By the way, about porting from OpenBSD, I had an idea of creating an "ucblib" package that could be used as a libbsd alternative. Not inside Heirloom NG, though.

mamccollum commented 1 year ago

(...) my idea is to try to make this package not-so-CDDL'd, since it could cause annoyance later on to folks trying to use this in an embedded environment. Don't misunderstand me, please, but I think it would be more helpful (...) to have default utilities with the new POSIX 2008 standard implemented.

Alright, I understand! I'm not offended at all, and I understand the licensing issues & the need for updating standards. Is there any explicit tasks that should be done on that side? I know there's the docs for 2008 and 2017 and I could check for the behavior between expected in the standard and the results of Heirloom.

But, anyway, what tools do you propose to port?

I was double-checking with the tools in the GNU coreutils (not saying at all to port from there, but I was looking for compatibility with it) and I could have sworn there was more that I wanted to port, but all I could find that was actually missing that wasn't 100% a GNUism was seq and rmdir, which are relatively easy to develop.

Also on the topic of standards like POSIX -- is there any test suite I could/should use to see how compatible Heirloom NG's utils are in comparison to other stuff? I know there's the Open POSIX test suite, but it looks to not have been updated in at least a decade, if not another 5 years after that. I've also heard of the GNU coreutils test suite that the folks over there use to test their software.

takusuman commented 1 year ago

Is there any explicit tasks that should be done on that side?

Nothing that I can say at a first overview, besides the rm fix that I've said about before.

I know there's the docs for 2008 and 2017 and I could check for the behavior between expected in the standard and the results of Heirloom.

These will do the trick. 😄

that was actually missing that wasn't 100% a GNUism was seq and rmdir, which are relatively easy to develop.

Well, I could argue that one may use {1..X} or a C-style for loop instead of seq 1 X for counting, but that only applies in Korn Shell 93, GNU Bourne-Again and Z-Shell, and that there are many people who still writing Shell scripts for POSIX-only environments and depend on seq.

seq itself, except for some GNU extensions, is just a matter of taking argv[0] and argv[1] and counting between them. rmdir shall be just a system call-kiosk command (like readlink, that I've implemented), so it also shouldn't be hard to implement.

Definitively on the list. 👍🏽

Another two commands that I was thinking about implementing were watch and timeout, but my code haven't work so well and I didn't tried to go further since then.

takusuman commented 1 year ago

Also on the topic of standards like POSIX -- is there any test suite I could/should use to see how compatible Heirloom NG's utils are in comparison to other stuff?

I was thinking about that recently and... no, there's nothing to test all the utilities at once. However, sed has its own set of tests and grep has some tests described at the NOTES file. Maybe a sane option would be to port the tests from toybox, since its public domain and it won't have any licence problems with this current source tree. I was also thinking about testing in GitHub's VMs if it builds on other flavors of BSD and MacOS, it would be useful to this information both here as a table and as a badge on the website.

mamccollum commented 1 year ago

I was also thinking about testing in GitHub's VMs if it builds on other flavors of BSD and MacOS (...)

vmactions has workflows for building on the BSDs and Solaris even though they're not officially supported (I believe it uses the macOS VM with a VM inside of that to do the trick).

P.S. I once tried building on macOS and there were... several big issues. From a lot of missing headers (can be fixed with macbrew, but it's quite annoying) to similar issues with deprecated functions that OpenBSD had, and even things as basic as APFS not being case-sensitive (unless you specifically configured your Mac's FS otherwise in a rescue CD). I didn't want to create an issue at the time because it's a massive series of issues and I think it's better that we focus on primarily Linux & secondarily the BSDs for now. (Though I'm not the leader, so just take this as friendly advice)

takusuman commented 1 year ago

vmactions has workflows for building on the BSDs and Solaris even though they're not officially supported (I believe it uses the macOS VM with a VM inside of that to do the trick).

Yeah, I've been thinking about it! Although it is needed, maybe it's not our priority for now comparing to implementing those new tools.

P.S. I once tried building on macOS and there were... several big issues.

I've imagined...

I didn't want to create an issue at the time because it's a massive series of issues and I think it's better that we focus on primarily Linux & secondarily the BSDs for now.

Sure, I agree with you.

mamccollum commented 1 year ago

Also, I made a commit in my fork of the repository that changes the readlink makefile to where it will clean up the UCB binaries as make mrproper was leaving them behind. Should I just leave that commit there and make a PR later when I make more changes to my fork?

takusuman commented 1 year ago

Also, I made a commit in my fork of the repository that changes the readlink makefile to where it will clean up the UCB binaries as make mrproper was leaving them behind.

Damn it, how could I have missed that out?

Should I just leave that commit there and make a PR later when I make more changes to my fork?

For me it seems O.k.

mamccollum commented 1 year ago

Understood. I'll look into the test suites from toybox and get back to you. 😄

takusuman commented 1 year ago

Understood. I'll look into the test suites from toybox and get back to you. smile

Good! And I'll be implementing the seq command. Merci![^1]

[^1]: I'm not sure if it's still being used in English, but I've learnt that "Merci" could be used as a thank in English.

mamccollum commented 1 year ago

One more thing -- chgrp is also missing. Forgot to mention that, sorry.

takusuman commented 1 year ago

One more thing -- chgrp is also missing. Forgot to mention that, sorry.

Actually, it's not. Many commands on Heirloom are supplied by symbolic/hard links that change argv[0]. For instance, chgrp is a link to chown, dfspace is a link to df etc, take a look at the manual pages on the website. The commands that don't have a description and/or an own directory are supplied per symbolic/hard links.

mamccollum commented 1 year ago

Oh wow, I didn't notice that. Thanks for informing me!

takusuman commented 1 year ago

O.k., the initial implementation wasn't standard, so I implemented seq according to the standard and got some new funky bugs. I don't really know where to go now. https://github.com/Projeto-Pindorama/heirloom-ng/tree/seq-impl

EDIT: Kind of fixed at #29

takusuman commented 1 year ago

Hi, Molly (@mamccollum), good night. How you are? Just an update: now seq works just fine --- at least here, not sure if it won't be breaking or misbehaving in other host yet --- and rmdir already got implemented by Gunnar Ritter back in July 2002, and it also has its own directory here in the source tree.

seq was harder than I thought, not because of the algorithm, but because of the implementation standard combined with my "just do it"/compact style of programming --- while many got it in more than 90 or 100 lines, I got it in 65 lines of code, counting spaces. There were some hours of debugging in the last day in which I programmed it, along with some segmentation faults.

takusuman commented 1 year ago

Another two commands that I was thinking about implementing were watch and timeout, but my code haven't work so well and I didn't tried to go further since then.

watch(1) implementation first made at #30, improved at #31. On my way to timeout(1).

mamccollum commented 1 year ago

Hey, that's good to hear. I think a while back (probably around 3 weeks ago now) I was working on a feature here, but I sadly forgot what it was. My fork ended up just turning into a messy disaster and I had to re-start. Is there anything I should assist in working on?

takusuman commented 1 year ago

Hey, that's good to hear. I think a while back (probably around 3 weeks ago now) I was working on a feature here, but I sadly forgot what it was. My fork ended up just turning into a messy disaster and I had to re-start. Is there anything I should assist in working on?

Well, just testing and completely porting to OpenBSD (although I think having contributions from more OpenBSD folks would help too). watch(1) works surprisingly well for its "100-liner" size --- I risk to say that it's even better, proportionally comparing, than procps-ng's watch(1) ---, but the title/information header size being smaller than the terminal maximum line width kind of annoys me a little. About timeout(1), I couldn't rewrite it yet, more because of it being complex. I'm afraid that, in the end, I end up sourcing it from OpenBSD's source tree and doing some modifications so it fits on Heirloom NG --- like using SVR4-like error reporting via pfmt()/prerror() instead of err()-like functions, etc.

takusuman commented 1 year ago

Just realized that du doesn't return anything if called with just one file.

takusuman commented 1 year ago

Just realized that du doesn't return anything if called with just one file.

I was busy to correct myself, but nevermind, it's part of the standard. If I'm not mistaken, only /usr/5bin/posix/du prints individual files without -h or -s per default. For ones who want to write portable scripts, use always du -s or even du -hs.

takusuman commented 10 months ago

@mamccollum I was messing around with Heirloom tar (since it's on my PATH) and I think I've found a new glitch.

I usually copy folders using tar -cvf and tar -xvf in a pipeline, like Plan 9 does, and I've found out that it doesn't extract for some reason, printing "tar: 1 file(s) not extracted". 2023-09-08-035358_680x239_scrot

I'll take a deeper look on it later but, for now, I'll be using Schily's tar as always.

xplshn commented 5 months ago

chimera-utils has made a lot of the groundwork for using fBSD coreutils in Linux. We could take some of the patches from there. Also, what about the code from SBASE and UBASE? by suckless.org, its all MIT licensed.

takusuman commented 5 months ago

chimera-utils has made a lot of the groundwork for using fBSD coreutils in Linux. We could take some of the patches from there. Also, what about the code from SBASE and UBASE? by suckless.org, its all MIT licensed.

First of all, sorry for the delay on the response. I think that taking a small part of chimera-utils may be useful for some utilities --- such as write, that, although having to be compliant with its UNIX v7 version to be par with mesg, or something to help finish the timeout implementation.

About SBASE/UBASE, I have already taken a look at some of the code, I think that it would be equivalent to copying code examples. It may be useful as a reference, but I think that we can implement more complete utilities. But that's something to consider too.

xplshn commented 5 months ago

sbase is POSIX and minimalist. I thought that heirloom strived for that too. Is compatibility with GNU coreutils a wanted feature for the project?

takusuman commented 5 months ago

sbase is POSIX and minimalist. I thought that heirloom strived for that too. Is compatibility with GNU coreutils a wanted feature for the project?

You got it a little bit off I what I meant. Heirloom is meant to be POSIX, if one wants to put it this way, but in the mid-path betwixt simplicity and convenience --- like you would find in some UNIX-compatible system cited at Heirloom's intro. Suckless' sbase are more like an example --- similar to what you could find at OpenGroup.org when looking at the POSIX specification --- of what a command should be instead of what it could be. I don't want Heirloom to be like GNU Coreutilities in this fork, but I also don't want to call just a handful of lines a "complete utility" when it lacks features that are useful for the end user --- and, at the same time, not redundant. A good example about what I mean would be a comparison betwixt procps-ng's watch, Heirloom NG's and Suckless' ubase one. procps-ng's is overly complicated, Heirloom NG does its job flawlessly in less than 1/4 of L.O.C. that procps-ng has and Suckless' ubase works as good as a shell script hack done in 2 minutes and a half.

takusuman commented 5 months ago

For a matter of honesty, I must say that Suckless {u,s}base may work as well as POSIX specification for enlightening the way to go --- but I wouldn't plain copy the code into Heirloom's source tree just because it's under a compatible licence, at most fork it and enhance it.

takusuman commented 5 months ago

I'm quite busy lately, so I can't burn daylight working on #36 for now, even having references instead of writing completely from scratch as you suggested, nor discussing this matter further.

xplshn commented 5 months ago

Okay, I think I get it. Then what about taking inspiration/work/code from Toybox? 0BSD licensed, by the same guy that started Busybox, its currently used in Android phones, it also works on MCU-less devices, and its pretty lightweight yet convenient.

takusuman commented 5 months ago

Then what about taking inspiration/work/code from Toybox? 0BSD licensed, by the same guy that started Busybox, its currently used in Android phones, it also works on MCU-less devices, and its pretty lightweight yet convenient.

Yeah, that is somewhat the goal: a sane and yet convenient environment. Toybox uses too many internal functions, so porting code directly from it is more difficult than doing a clean-room implementing. NetBSD/OpenBSD's userland is also an inspiration, but we avoid to take code directly from it and try to get how to implement utilities by ourselves.

I have taken your idea of basing some utilities on OpenBSD code when fixing/"filling" Heirloom NG's timeout implementation.

takusuman commented 5 months ago

Released 240220 today, I would like a feedback from some of you.

@arthurbacci pointed out that my method for converting the float back to a string is redundant and could lose precision, I have already taken this into consideration to fix in the next release. seq is ridiculously incomplete yet, I think it could at least mimic Research UNIX v8 implementation and also have its "separator" condition fixed for some cases, but I think I can get around with this.

takusuman commented 3 months ago

Released 240220 today, I would like a feedback from some of you.

Many fixes are now being addressed at #41.

takusuman commented 3 months ago

I would like to complement this issue with the fact that tar is somehow broken. For instance, while I was testing Copacabana Linux build system, I noticed that Heirloom's tar isn't extracting tar balls passed for it per a pipeline, responding with this error:

tar: 2 file(s) not extracted

Maybe this can be fixed after some debugging, just taking notes here if someone got to it before me.

arthurbacci commented 3 months ago

Please open an issue for tar

takusuman commented 3 months ago

Please open an issue for tar

That's going to be fun.

takusuman commented 3 months ago

Opened a specific issue for tar at #44. cc.: @arthurbacci

takusuman commented 1 month ago

@arthurbacci I was thinking, could we borrow the "libutf-8" from Plan 9 for libcommon? Or some other implementation of UTF-8 for C.

I found this page which lists some implementations (including Plan 9's) and drawbacks: https://www.linuxdoc.org/HOWTO/Unicode-HOWTO-6.html

There's also utf8proc by the Julia Programming Language development team, it looks good, and it's also small. https://github.com/JuliaLang/utf8proc

This would be a grand improvement on Heirloom, since we could make other programs UTF-8 compliant too.

I'm saying this mostly because of #52, but we could apply this to ls, more/pg, everything (almost).

arthurbacci commented 1 month ago

@arthurbacci I was thinking, could we borrow the "libutf-8" from Plan 9 for libcommon? Or some other implementation of UTF-8 for C.

I found this page which lists some implementations (including Plan 9's) and drawbacks: https://www.linuxdoc.org/HOWTO/Unicode-HOWTO-6.html

There's also utf8proc by the Julia Programming Language development team, it looks good, and it's also small. https://github.com/JuliaLang/utf8proc

This would be a grand improvement on Heirloom, since we could make other programs UTF-8 compliant too.

I'm saying this mostly because of #52, but we could apply this to ls, more/pg, everything (almost).

Maybe https://libs.suckless.org/libgrapheme/ ?

takusuman commented 1 month ago

Maybe https://libs.suckless.org/libgrapheme/ ?

Any way to drop-in in this project? I would've surrender to wchar if I could implement it without getting cryptic memory faults when running fgetwc().

xplshn commented 1 month ago

@arthurbacci I was thinking, could we borrow the "libutf-8" from Plan 9 for libcommon? Or some other implementation of UTF-8 for C.

I found this page which lists some implementations (including Plan 9's) and drawbacks: https://www.linuxdoc.org/HOWTO/Unicode-HOWTO-6.html

There's also utf8proc by the Julia Programming Language development team, it looks good, and it's also small. https://github.com/JuliaLang/utf8proc

This would be a grand improvement on Heirloom, since we could make other programs UTF-8 compliant too.

I'm saying this mostly because of #52, but we could apply this to ls, more/pg, everything (almost).

The LibUTF from Sbase is also really good and tidy: https://git.suckless.org/sbase/files.html But I feel like libgrapheme would be better in the long term, its very maintainable

takusuman commented 1 month ago

The LibUTF from Sbase is also really good and tidy: https://git.suckless.org/sbase/files.html But I feel like libgrapheme would be better in the long term, its very maintainable

That's a good suggestion too, but maybe we will just stick to libgrapheme. I hope it doesn't change much, so we could just embed it in the code and add some directions at the build system to link it.