replace TREE with PDTREE

mateuszviste commented 5 days ago

SvarDOS comes currently with TREE v3.7.2 from the FreeDOS project, itself being a 1995 commercial code by Dave Dunfield, GPL-ized somewhere in the 2000s.

There is also pdTree: a public domain version written by @PerditionC https://web.archive.org/web/20011212151852/http://www.darklogic.org/fdos/projects/tree/ https://github.com/FDOS/tree

Glancing at the source code it appears to be a very clean and memory-efficient implementation. Perhaps SvarDOS could use it instead of the current GPL tree after some minor work:

"port" it from TCC to OpenWatcom, this should be a formality, code looks very nice and standard
replace CPP parts with ANSI C (did not investigate this, but main file is CPP so I guess there must be some C++ hidden somewhere)
create a proper makefile
drop all Windows-related stuff
maybe remove LFN support, if it leads to any significant simplification
replace CATGETS with SvarLANG
add a few translations (DE, FR, PL, TR, ...)
remove (/reimplement) all GPL-tainted code (DB.C, DB.H, GET_LINE.C ? perhaps these are only CATGETS dependencies)
drop libc, reimplement necessary (few) std functions and link with WMINCRT
add some documentation/notes that explains where this TREE comes from and to avoid any confusion with KJD's original

boeckmann commented 5 days ago

Well, that bullet list doesn't exactly look like "minor work". But if you think its worth it, go for it. Maybe we should put a counter anywhere which counts the packages still being GPL licensed, and try to converge that towards zero :)

PerditionC commented 4 days ago

it should build with ow given it builds with msvc, but I will check it this week. note I have a pd implementation of cats that I plan on switching out the lgl one with and releasing as fd tree 4.0. It's been a while, I'm guessing it builds with a batch file. Adding a make based build for ow should be trivial, so I will check on it. I guess I need to make sure all translations are updated, both versions use same files for translations. There is no c++ other than maybe some variable scoping, I believe it was a naming conflict resolution. The api is based on win32 findfile, so removing lfn support likely won't really reduce size much.

I have too look, may need to add a findfile using ow pragma aux.

(off topic, but fd format is switching back to small model, but small penalty to copy strings from far buffer to near accessible one).

I will try to get some updates on GitHub later this week.

mateuszviste commented 4 days ago

Maybe we should put a counter anywhere which counts the packages still being GPL licensed, and try to converge that towards zero :)

I did think about it for a short time, but ultimately dropped the idea because I do not want to give the impression that "fighting" with GPL is one of SvarDOS goals.

I plan on switching out the lgl one with and releasing as fd tree 4.0.

So pdTree will soon become the default FreeDOS tree, replacing the 3.7.2 version from Dave Dunfield? Cool. Is there any practical reason for it?

BTW the FreeDOS listing is somewhat confusing: it lists TREE 3.7.2 by Dave Dunfield, but provides a link to your PD version: https://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/distributions/1.2/repos/pkg-html/tree.html

I have too look, may need to add a findfile using ow pragma aux.

It's exactly what I was planning to do :) Great to hear that you are still interested in this 20-years old code!

mateuszviste commented 3 days ago

I started doing some maintenance & trimming on pdTree here: http://svn.svardos.org/log.php?repname=SvarDOS&path=%2Ftree%2Ftrunk%2F&isdir=1

So far I removed most of Windows-specific stuff, utf8 output, CATGETS, and wide characters support. I have also replaced all C++-isms by ANSI C equivalents, so this fork of pdTree is 100% C now.

mateuszviste commented 1 day ago

After some more hours of hacking, slashing and reformatting, SvarDOS TREE is finally functional. It compiles with OpenWatcom with a proper makefile, loads its translations via SvarLANG and appears to work reliably as far as I can tell. I removed a LOT of code (no LFN, no Windows support, no Unicode...), simplified many parts of it and replaced some functions with OpenWatcom-specific calls.

UPXed it is a 11K COM file now. Not as small as I want it to be, but there is still hope to get a binary smaller than the GPL FreeDOS TREE (10K) once I get rid of printf. I will probably not go the WMINCRT route, though, since I rely heavily on OpenWatcom's libc now.

Two things that I have yet to do:

write some (light) document that explains the history of this TREE
review the translation strings, there are at least some unused strings, removing them will help decreasing the footprint

boeckmann commented 1 day ago

Well done :) We could spare some bytes by not writing the default language to the .lng file. We could adapt tlumacz to at least optionally skip outputting the default language. For FDISK, for example, that would save over 10k bytes. And I am sure that I will recompile the program when I have to update the default language anyway.

mateuszviste commented 1 day ago

We could spare some bytes by not writing the default language to the .lng file.

Yes but then it would not be possible to re-load EN from within the application (if the application supportes such reloads). This is used by the SvarDOS installer.

mateuszviste commented 1 day ago

a win would be to have each language in the LNG file optionally compressed. It's human language, lots of redundancy. There must be some algorithm nowadays that is capable of in-place depacking with a reasonably small depacking code.

boeckmann commented 1 day ago

I took the opportunity to make the makefile compatible with Linux. Should still work fine under DOS (at least in DosBox it does).

ecm-pushbx commented 1 day ago

a win would be to have each language in the LNG file optionally compressed. It's human language, lots of redundancy. There must be some algorithm nowadays that is capable of in-place depacking with a reasonably small depacking code.

You may want to look at my use of heatshrink in my Extensions for lDebug, extpak.eld and list.eld. You need a buffer the size of the depack window. If you access the compressed file in a linear way then you can re-use the same depacker state across several calls into the depacker.

ecm-pushbx commented 1 day ago

Main depacker is in https://hg.pushbx.org/ecm/ldebug/file/a35f88de973a/source/eld/depack.asm

mateuszviste commented 1 day ago

Main depacker is in https://hg.pushbx.org/ecm/ldebug/file/a35f88de973a/source/eld/depack.asm

Seems a bit complicated. I was thinking about something much simpler. Maybe a custom algorithm that would read bytes from the data file and when spotting a special marker (say, 0xFF O S), it would know that it has to copy now S bytes from offset -O in past stream. Probably not the most efficient approach, but should be easy to implement in any language, and might provide good enough results for text strings. I might investigate this in incoming days, once I am done with SvarTREE.

mateuszviste commented 17 hours ago

I committed an experimental change to SvarLANG: TLUMACZ outputs now two versions of the LNG file: one as usual (OUT.LNG) and the other one compressed (OUTC.LNG). SvarLANG is able to read both. Older versions of SvarLANG will not crash on it, just ignore the compressed languages.

The compression scheme is very simple, I invented it during a coffee break and implemented within an evening hour. It is not meant to be state of the art - just somewhat better than uncompressed text and simplest possible to decompress with no extra memory buffer. It assumes highly redundancy of comoressed data, should work well with text, but attempting to compress binary things is likely to produce "compressed" data twice larger than the original.

It works on a simple example, I will proceed with further tests tomorrow or next week. There is also one or two tweaks I'd like to check.

mateuszviste commented 6 hours ago

Some preliminary results:

compressing TREE.LNG yields no benefit, the size of the strings is almost the same (8.7K uncompressed, 8.8K compressed).
compressing FDISK.LNG makes the string resources over twice smaller (72K uncompressed, 35K compressed).

TREE still works after compression. FDISK I could not check because for some reasons I am not able to compile it (unrelated to SvarLANG).

mateuszviste commented 6 hours ago

SVARCOM.LNG compresses from 59K to 35K. Works properly.

This looks quite good. I think I will publish a new SvarLANG version today or tomorrow, and then migrate SVARCOM to it. I just need to test it on some real & ancient (286) hardware first to make sure it is not too slow.

boeckmann commented 4 hours ago

compressing FDISK.LNG makes the string resources over twice smaller (72K uncompressed, 35K compressed).

Awesome :) Time then for a new FDISK version.

FDISK I could not check because for some reasons I am not able to compile it (unrelated to SvarLANG).

Can you give some details about the build environment? Then I can try to reproduce.

boeckmann commented 4 hours ago

Btw. the current FDISK.LNG (version 1.3.16) is 104k. May it be that you are working with an older FDISK source?

boeckmann commented 4 hours ago

Or is it only the strings you are counting?

mateuszviste commented 4 hours ago

May it be that you are working with an older FDISK source?

Yes, I took some old source tree that was laying on my HDD, I did not have enough courage to study git again to find how to checkout the latest code. I will download the latest source as a zip file and try again.

mateuszviste commented 4 hours ago

tested with latest FDISK source tree. With SvarLANG's MVCOMP the strings are: uncompressed = 102K compressed = 49.7K

FDISK works fine (help screen and main screen display correctly).

boeckmann commented 3 hours ago

Perfect! Do you mind if I add a /x switch to tlumacz to exclude the default lang? Would save another 6-8k in case of FDISK...

mateuszviste commented 3 hours ago

Perfect! Do you mind if I add a /x switch to tlumacz to exclude the default lang? Would save another 6-8k in case of FDISK...

Be my guest. :)

I think I am done with SvarLANG for now, MVCOMP is working unexpectedly well for such a quick hack.

mateuszviste commented 51 minutes ago

I have moved mvcomp compression of lang blocks to a dedicated switch: TLUMACZ /comp When compression is enabled, it will actually compress the lang block only if it is beneficial (saves at least one byte) - so using /comp will never be worse than uncompressed LNG (worst case it will be left uncompressed).

mateuszviste commented 45 minutes ago

Having the compression flag on a per-language basis allows to have some languages in the LNG compressed and other not. I was not sure if this was useful to do it like that, but now I see it was a good decision. In the case of TREE, some languages can be slightly compressed, while other do not, so TLUMACZ applies compression only where it makes sense:

lang EN mvcomp-ressed (949 bytes -> 942 bytes)
lang DE mvcomp-ressed (1001 bytes -> 990 bytes)
lang ES mvcomp-ressed (1091 bytes -> 1032 bytes)
lang FI left UNCOMPRESSED (uncomp=943 bytes ; mvcomp=956 bytes)
lang LV left UNCOMPRESSED (uncomp=948 bytes ; mvcomp=1034 bytes)
lang PT mvcomp-ressed (1019 bytes -> 978 bytes)
lang RU left UNCOMPRESSED (uncomp=959 bytes ; mvcomp=1082 bytes)
lang TR left UNCOMPRESSED (uncomp=985 bytes ; mvcomp=1038 bytes)

SvarDOS / core

replace TREE with PDTREE #127