Porting 8086-toolchain to ELKS

ghaerr commented 5 days ago

This is a continuation of the discussion in https://github.com/ghaerr/elks/issues/1443#issuecomment-2489091235, regarding issues getting what is hopefully the latest version of a C86 compiler and @rafael's port of its included (older) NASM assembler running on ELKS.

At the moment, there is some consideration of using Dev86's CPP C preprocessor, producing Dev86-compatible AS86 format object file out from NASM, and possibly using Dev86's LD linker, as both CPP and LD are (hopefully) likely to be easily ported to the ELKS 8086-only environment.

I'm not sure where the best current sources are for Dev86 - it used to be that @jbruchon hosted them on Github, and that versions' upstream is quite old, but still present: https://github.com/lkundrak/dev86. It seems that jbruchon has moved his version to Codeberg at https://codeberg.org/jbruchon/dev86. During the last four years, I am aware of a number of bug fixes posted to his repo when it used to be on Github. I would recommend starting with jbruchon's Dev86 unless another more updated version is found on Github.

ELKS shares quite a history with Dev86, just five years ago the entire kernel and C library were compiled using its BCC->AS86->LD toolchain. The ELKS C library had originally bin in dev86/libc but had been moved prior to that.

While it could make sense to use Dev86's CPP and LD in order to get C86 running more quickly on ELKS, unfortunately the BCC compiler is K&R only, and doesn't support ANSI C at all.

@rafael2k, which repo are using for your CPP and future LD ports? I would assume that if you can get them running, both will be moved into your https://github.com/rafael2k/8086-toolchain repo.

jbruchon commented 5 days ago

I see my name exists. Let me know if you want any help. I'm happy to take patches for whatever I've moved to Codeberg. I'm not dead yet!

bocke commented 5 days ago

One obvious thing lacking is an "ar" implementation. For static libraries. It might be worth to look into Dev86 implementation, too.

ghaerr commented 5 days ago

I have temporarily been using the upstream 8086-toolchain to quickly compile up macOS-hosted versions of C86 and NASM, since @rafael2k's version is currently an ELKS-only build.

I have been playing its C86 and NASM to get more information on how it works and its continued suitability for an ELKS-hosted 8086-only C compiler toolchain.

Build script:

bin/c86 -g file.c file.asm
bin/nasm -f obj -l file.lst  file.asm

Here are the current results:

There is no C preprocessor, so I'm testing with code which doesn't require any external files (#includes).
The compiler appears to compile regular C code quite well, but when any error is found, no .asm file is produced at all. I have not yet gotten into any issues of ANSI compliance, just looking at workflow for the time being.
When a function is not included in the C source (i.e. declared "extern"), the compiler does not spit out an EXTERN directive in the ASM file. This causes NASM to fail the assembly step because of an unknown symbol.
The "unknown symbol" is a real problem - this means that separate .o/.obj files cannot be produced for later linking w/o manual intervention of the .asm file, or modifying the compiler.

Looking at the class slides showing how intended workflow using the toolchain is utilized, it appears C86 is not built for and knows nothing about the idea of multiple input or output files. The school class workflow shows utilization somewhat like the following:

cpp file.c > tmp.i
c86 tmp.i > file.asm
cat file.asm clib.asm > tmp.asm  # entire C library assembled with C86 output
nasm tmp.asm -o file.com  # produce binary .com output
emu86 file.com  # super simple load and run .com file

So we could face an uphill battle for certain unhandled situations (e.g. extern functions) and have to modify C86 in order to get NASM to output .o or .obj output that will communicate properly with the linker what to do with various constructs.

Examples of problems could be .comm data (e.g. int a; in two files gets combined by the linker with .comm directive, or int a = 3 goes into .data section rather than .bss, etc).

This is not all bad - C86 itself seems to handle the C source I've thrown at it - but creating a "toolchain" out of it might take some work, unless we want to generally produce smaller programs and/or compile and assemble everything at once. I'll help with whatever needs to be done.

ghaerr commented 5 days ago

One obvious thing lacking is an "ar" implementation. For static libraries.

Yes, Dev86 has ar, and it would well suited to use it if Dev86 LD is used.

Given my report above, we're ahead of ourselves since C86 doesn't ever produce a GLOBAL or EXTERN directive in its .asm output, so there'd be no symbols to manage, and NASM won't produce a .o or .obj file with undefined externals.

So, for now, we're really talking about just getting CPP, C86 and NASM to compile, assemble, and produce a .bin (.com) binary output file with just those three tools. In order to do anything actually useful, we'll use @rafael2k's poor-mans a.out header (likely created with an included .asm file) and the nasm -P option to automatically include it. Something like:

cpp file.c > tmp.i
c86 tmp.i file.asm
rm tmp.i
nasm -Pheader.asm file.asm -o file.aout

After that, in order to do anything useful (like call an ELKS system call to display something), we can easily produce a syscalls.asm file that can be added to the NASM assembly step, assembling header.asm, file.asm and syscalls.asm into a single a.out-compatible file that will load and run on ELKS.

The NCC Project, oriented towards x86-64, uses NASM as well but uses a similar approach for syscalls, where a Linux-compatible list of system calls is linked to provide all system calls. While each system call should be in a separate library function, the NCC approach will work very well to provide a full set of system calls in a single .asm file, just by renumbering the system calls to ELKS' system call list.

After that, a CC wrapper program could be built which automatically performs much of this workflow, and hide it from the user. Even though programs might be quite a bit larger in the beginning than necessary (because of the inclusion of all system calls or even perhaps a full mini-libc in a single source file), it could all be made to work. I like C86, but the big disadvantage is that we're not starting with a toolchain, instead we're having to build one. Lots of work, but fun.

ghaerr commented 5 days ago

More news: on the upstream 8086-toolchain class online resources page, there's a link to known tool problems which discusses some known problems with C86.

While most are seemingly OK, like having to declare all local variables at the start of a function rather than anywhere, there are a couple issues that could be very problematic for porting any larger piece(s) of code: the compiler apparently has problems dealing with multiple C statements on a single line separated by semicolon, as well as having register allocation problems when more than one C operator is used in an expression at once (which means porting the ELKS library code will be problematic), and then a killer problem of the usage of long/unsigned long (any 32-bit arithmetic) not working well.

I've asked the 8086-toolchain maintainer for a copy of the C86 Manual, which is currently a dead link. Hopefully it can be found and we can read more about what C86 does and doesn't do.

bocke commented 5 days ago

One obvious thing lacking is an "ar" implementation. For static libraries.

Yes, Dev86 has ar, and it would well suited to use it if Dev86 LD is used.

Given my report above, we're ahead of ourselves since C86 doesn't ever produce a GLOBAL or EXTERN directive in its .asm output, so there'd be no symbols to manage, and NASM won't produce a .o or .obj file with undefined externals.

Oh... Reading this and your following answers, it seems c86 has some problems that make it problematic for use for anything more than small and fast projects. At least without any more thorough changes to the compiler. Still, it might be useful in some cases.

Btw, ar might potentialy be used directly with nasm and ld86. That of course requires for project to be written entirely in assembly.

rafael2k commented 5 days ago

Hi all. I got the cpp and ld from https://github.com/lkundrak/dev86 Thanks for pointing out the official upstream at https://codeberg.org/jbruchon/dev86

hintron commented 5 days ago

More news: on the upstream 8086-toolchain class online resources page, there's a link to known tool problems which discusses some known problems with C86.

Yes, I meant to mention this when I had the chance, but you beat me to it! c86 has some limitations that may not make it suitable as a general-purpose 8086 C compiler. It was only used to compile a toy Real-time Operating System for the 8086 emulator. But y'all seem like you know what you're talking about with compilers, so perhaps you might be able to fix those limitations. I myself have not done much compiler work, so I probably won't be of much help here.

hintron commented 5 days ago

As far as the license and history goes for the project, here is what my professor, James Archibald, said to me via email (on 2024-11-20):

So, here’s what I recall about the toolchain. Wade Fife was instrumental in creating the emulator (possibly starting with some existing 8086 emulator that we got somewhere, extended to mimic the functionality of an earlier emulator that I had created for another ISA) and getting the compiler to work for our purposes. I’m virtually certain that all the code we used as a starting point (including the c86 compiler) was in the public domain at the time. There was definitely no licensing involved. I think it was buggy and Wade fixed a lot of stuff, but certainly not everything. I think there were always a few (complicated) C constructs that it never handled correctly, so we basically just told people not to use those C constructs. I couldn’t tell you the extent of modifications made on our end to the compiler, but I think they were fairly extensive initially, with a few more added over time (possibly some by myself, and probably some by ambitious TAs).

I have no problem with you preserving [the class] content online. It's actually kind of flattering to think that anyone finds it useful. And I certainly have no problem with anyone modifying and using our toolchain.

So I believe you are free to extend and use as you see fit.

bocke commented 5 days ago

C86 has some info on licensing in cmain.c (main.c in earlier versions)

 Copyright 1989, 1990, 1991 Christoph van Wuellen.
 Credits to Matthew Brandt.
 All commercial rights reserved.

 This compiler may be redistributed as long there is no
 commercial interest. The compiler must not be redistributed
 without its full sources. This notice must stay intact.

rafael2k commented 5 days ago

An oral authorization is enough in my opinion, as this implies no one will try to sue us in the future. I like C86, even with those limitation, as it is pretty small and relatively simple to understand. As an educational tool is it perfect, and instead of a toy OS, we can use it in the powerful ELKS. : )

bocke commented 5 days ago

Unrelated to the question of licensing (not a lawyer, anyway), there is some documentation about c86 here:

http://retro.co.za/68000/CC68K/QDOSC68K/c68.txt

Although, this is related to a version maintained by the Walkers (Keith and Dave). I wonder if ECEn 425 version is based on this or on an earlier version? Also, I wonder if there is a newer release by Keith and Dave?

Other parts of toolchain on that page are for QDOS (Sinclair QL). So, not really interesting for us here.

bocke commented 5 days ago

There are two versions of C86 sources here (one from 1998, one from 1999): https://morloch.hd.free.fr/smsq/#C68

bocke commented 5 days ago

Found more docs (for QL version): https://dilwyn.qlforum.co.uk/docs/ebooks/C68%20Documentation%20-%20Dave%20Walker.pdf

bocke commented 5 days ago

I found a copy of the compiler's homepage: https://gopherproxy.meulie.net/gopher.xepb.org/h/mirrors/homepage.ntlworld.com/itimpi/compsrc.htm

Also available on Internet Archive: https://web.archive.org/web/20150908032106/http://homepage.ntlworld.com/itimpi/compsrc.htm

rafael2k commented 5 days ago

@ghaerr sent this message:

https://ladsoft.tripod.com/
https://ladsoft.tripod.com/cc386_compiler.html

Looks like it was maintained for quite a while into 2017 and then replaced by Orange C. There are apparently Win32 and DOS 16-bit versions.

bocke commented 5 days ago

That's a different project, as far as I know. It was in development since early nineties, maybe even before. As far as I know, it was written from scratch and it has always supported 386 only. It never had a support for 8086.

bocke commented 5 days ago

Oh. I see I am wrong. I was sure it was written from scratch.

bocke commented 5 days ago

One thing I was right about, it only emits 386 code:

These compilers emit 32-bit code.  Together all the tools generate DPMI
programs that run under DOS.

And, it has been in development for so long, it's probably significantly changed compared to the original. Might have as well been rewritten a couple of times. :)

rafael2k commented 5 days ago

Here is the manual of the compiler version I'm using at 8086-toolchain: https://hintron.github.io/8086-toolchain/stable/labs/c86manual.txt

The issue in 8086-toolchain upstream repo: https://github.com/hintron/8086-toolchain/issues/13

bocke commented 5 days ago

I found Mathew Brandt's version: http://cd.textfiles.com/crawlycrypt2/program/c/c68k_src/

Original copyright notice:

/*
 *  68000 C compiler
 *
 *  Copyright 1984, 1985, 1986 Matthew Brandt.
 *  all commercial rights reserved.
 *
 *  This compiler is intended as an instructive tool for personal use. Any
 *  use for profit without the written consent of the author is prohibited.
 *
 *  This compiler may be distributed freely for non-commercial use as long
 *  as this notice stays intact. Please forward any enhancements or questions
 *  to:
 *
 *      Matthew Brandt
 *      Box 920337
 *      Norcross, Ga 30092
 */

bocke commented 5 days ago

It was in development since early nineties, maybe even before.

I might be wrong about this too. I can vaguely place it in the late 90s, althouh the earliest version I found is this one from 2000 (ccdl*.zip): http://cd.textfiles.com/simtel/simtel0101/simtel/c/00_index.htm

rafael2k commented 5 days ago

I found a copy of the compiler's homepage: https://gopherproxy.meulie.net/gopher.xepb.org/h/mirrors/homepage.ntlworld.com/itimpi/compsrc.htm

Also available on Internet Archive: https://web.archive.org/web/20150908032106/http://homepage.ntlworld.com/itimpi/compsrc.htm

We need to compare to understand how different it is from the one in the 8086-toolchain. No problem in changing the source, if it makes sense.

bocke commented 5 days ago

It was in development since early nineties, maybe even before.

I might be wrong about this too. I can vaguely place it in the late 90s, althouh the earliest version I found is this one from 2000 (ccdl*.zip): http://cd.textfiles.com/simtel/simtel0101/simtel/c/00_index.htm

Found an early version from 1996 (ccdl122.zip). This version also supports only 68k and 386.

http://cd.textfiles.com/simtel9703/disk2/DISC2/C/

bocke commented 5 days ago

not a lawyer, anyway

But, what I can tell you is, it likely doesn't fit public domain definition. It's also not OSI compliant.

That doesn't mean it's not hackable, changeable or usable. It just means it's not really open source in OSI sense of the word and has limited usage and distribution permissions in comparisson to GPL, BSD and MIT licensed stuff.

rafael2k commented 5 days ago

Btw, @ghaerr, can you advice on adapting ld86 from elks aout v0 to v1?

ghaerr commented 5 days ago

adapting ld86 from elks aout v0 to v1?

The a.out .version field was changed from 0 to 1 to indicate that the interpretation of the previous 32-bit .chmem field's upper 16 bits were split off into a new 16-bit .minstack field that occupied the same space. This allowed the developer to specify a separate heap value in .chmen and stack in .minstack. Nothing else was changed.

ELKS can load both V0 and V1 executables, so no immediate change is necessary. At some point ld86 could be enhanced to output V1 executables by adding a min stack size command line argument and writing it in the revised header, along with version = 1. For now, since the small executables that are likely going to be built using either your poor-mans header or ld86, specifying either v0 or v1 will work the same, as both the chmem and minstack fields are zero anyways.

I can help more with this when ld86 is actually need by c86, as it seems for now there won't be any need for ld86 or ar until c86 is enhanced to output GLOBAL and EXTERN directives for function and data symbols.

What this really means is that, for now, a poor-mans a.out header can be fairly easily implemented with no c86 modifications using a pre- and post- .asm file around the NASM-assembled c86 output, with NASM then creating the a.out file directly using its -f bin option. The pre- header will list some internal symbols for start of text and data along with the a.out structure itself at address 0, and the post- header will calculate the length of each for inclusion in the a.out header fields.

ghaerr commented 4 days ago

Thanks @bocke for digging up all the documentation and history you've found. I haven't had time to read all of it yet, but from it seems, C86 was originally written by Matthew Brandt for 68000 CPUs.

After that, Christoph van Wuellen got involved, possibly with Keith and Dave Walker, another branch emerged, where support for a number of other processors was added, including 8086, and then 80386.

https://ladsoft.tripod.com/cc386_compiler.html

It appears that David Lindauer of LadSoft took an early version of Brandt's compiler, prior to Wuellen and the Walkers, and produced CC386. David mentions here that he added FAR pointers (which could be very interesting) as well as built a version for a professor for teaching, although I think that was for a different class than 8086-toolchain/. This branch had lots of enhancements and apparently became too large for real mode and eventually required DPMI to run it.

Our 8086-toolchain/ branch seems to have come after the work of Wuellen and Walker, when James Archibald and Wade Fife taught a class using it, and the class notes and resources are in the upstream repo. Wade Fife added code for NASM support and a number of other requirements, and all his enhancements are noted with WFS in the source.

So there's quite a bit of history with C86, having several names and multiple processor backends, all for a compiler I had never heard of. Should @rafael2k get this all working and end up needing more features, we can look at the previous variants and diff sources to determine whether a features has already been written or not - that'll be very helpful.

Here is the manual of the compiler version I'm using at 8086-toolchain

Great, looks like @hintron found the C86 manual. That information along with the published Known Issues should help us learn how to best use (and not use) C86 without having to read and understand the entire source.

rafael2k commented 4 days ago

Linker working on ELKS!

ps: I updated cpp and ld86 code to latest upstream.

ghaerr commented 4 days ago

Linker working on ELKS!

Wow, you're really going to town on this!!! I must say, quite a screenshot showing nasm and now ld86 actually running on ELKS. Very cool indeed. :)

I can see you're steps away from having an actual C toolchain on ELKS. That will be a first, for sure!

With nasm producing AS86 .o files, and LD86 running, this will eliminate the problem of having to "cat" .asm files together as previously required by 8086-toolchain. It seems that all that would be required is for C86 to emit EXTERN and GLOBAL statements for each symbol at the end of each compilation. Are you planning on doing that?

Are all the tools being compiled using OWC large model? I was thinking of add fmemalloc/fmemfree/fmemrealloc to the OWC libc so that it doesn't have to be manually added for all of your ports. Will that be OK, or are you running different versions for each?

ghaerr commented 4 days ago

[Oops - accidentally closed this issue, reopening and editing last message]

rafael2k commented 4 days ago

Will get some beers today, definitely. : )

All tools compiled with OWC large model.

ps: lemme know when there exists a fmemrealloc.

rafael2k commented 4 days ago

About C86 to emit EXTERN and GLOBAL statements, I need to understand where exactly to output it.

bocke commented 4 days ago

@ghaer I think that's pretty much it. With one small correction:

After that, Christoph van Wuellen got involved, possibly with Keith and Dave Walker, another branch emerged, where support for a number of other processors was added, including 8086, and then 80386.

According to c86.txt from here (this should be the same file as the one hosted on retro.co.za, I think): https://web.archive.org/web/20150908032106/http://homepage.ntlworld.com/itimpi/compsrc.htm

  Versions prior to release 4.0:
                   Christoph van Wullen.

So, 4.0 and newer were done by Keith and Dave Walker. That link should be the latest know homepage of that praticular port.

I think Christoph van Wullen did the original 386 port. Other ports likely came from Keith and Dave Walker. TMS320C30 port was contributed by Ivo Oesch.

Btw, this version (from ntlworld.com) has this in its header:

#define VERSION         "5.1 (beta)"
#define LAST_CHANGE_DATE    "25 Apr 2002"

Only two files were changed in 2002, but a lot of files were changed in 2001. So, this version is even newer than what it says on the site (27th June 1999).

ghaerr commented 4 days ago

lemme know when there exists a fmemrealloc.

I thought you'd written one? I took your idea and came up with the following, which is what I'm thinking of adding to the ELKS OWC library so its always available:

void __far *fmemcpy(void __far *dest, const void __far *src, size_t count)
{
    char __far *d = dest;
    const char __far *s = src;

    while (count--)
        *d++ = *s++;
    return dest;
}

void __far *memrealloc(void __far *ptr, unsigned long size)
{
    void __far *new;

    if (!ptr)
        return fmemalloc(size);
    new = fmemalloc(size);
    if (!new)
        return NULL;            /* previous memory not freed */
    fmemcpy(new, ptr, size);    /* FIXME copies too much, will work on 8086 only */
    fmemfree(ptr);
    return new;
}

There won't likely be an actual fmemrelloc kernel implementation, as the kernel memory manager doesn't support that and doesn't need it. So this is likely the same semantics as what you wrote. I added an fmemcpy so that it works in other models, and note that since the old size isn't known, too many bytes are copied. This won't work in any protected modes, but won't cause any problems for us in real mode, garbage bytes will be copied into the newly realloced (unused) memory.

I will post a PR with this and fmemrealloc/fmemfree in it, unless it'll cause you problems.

ghaerr commented 4 days ago

About C86 to emit EXTERN and GLOBAL statements, I need to understand where exactly to output it.

I haven't looked at the compiler source much, so I don't know where exactly it needs to be added. As far as where these can be placed in the .asm file output, at the very end will work fine, as NASM is two-pass. You'll only run into this after you try actually running your full toolchain on a .c file that has an extern or global definition - NASM will complain with an error and refuse to produce a .o file. We can look at this more when you get to that point.

ghaerr commented 4 days ago

About C86 to emit EXTERN and GLOBAL statements, I need to understand where exactly to output it.

Good news! I have the C86 compiler modified to output both EXTERN and GLOBAL directives, and things should now work using @rafael2k's planned approach of having NASM produce AS86-compatible output and passing that to LD86. I have also updated the C86 required compiler library clib.s mentioned on the resource page so that it also can be assembled into an object file and eventually added as part of a library, but otherwise always linked with LD86 applications using C86.

This also removes the big restriction of having to concatenate all the source files together, as is necessary with the BYU toolchain. It turns out that C86 had the capability of emitting global and extern directives but it was removed by Wade for preparation for the RTOS class which didn't use a linker. There are a number of other features that I'm looking at that might also be useful in the future that have been commented out.

For now, I'm using @hintron's repo, as I need a repo that can compile and run on macOS (and ELKS later). I can't yet easily produce commits for @rafael2k's repo until this is done. I would like to be able to fully track the changes if I'm going to be producing compiler enhancements.

For now, here's the diff that gets the compiler to work with NASM and produce AS86-compatible .o files. (BTW, objdump86 compiled in the ELKS tree will dump AS86 .o files for inspection, and omfdump will dump OWC-compatible OBJ files).

diff --git a/compiler/outx86_n.c b/compiler/outx86_n.c
index cb6e784..ed94236 100644
--- a/compiler/outx86_n.c
+++ b/compiler/outx86_n.c
@@ -1158,17 +1158,16 @@ PRIVATE void put_literals P0 (void)
 // we use a linker. -WSF
 PRIVATE void put_reference P1 (SYM *, sp)
 {
-       /*
     if (!is_symbol_output (sp)) {
        switch (storageof (sp)) {
        case sc_global:
-           put_noseg ();
-           oprintf ("\tpublic\t%s%s", outlate (nameof (sp)), newline);
+           //put_noseg ();
+           oprintf ("\tglobal\t%s%s", outlate (nameof (sp)), newline);
            break;
        case sc_external:
-           put_noseg ();
-           oprintf ("\textrn\t%s:", outlate (nameof (sp)));
-           puttype (typeof (sp));
+           //put_noseg ();
+           oprintf ("\textern\t%s", outlate (nameof (sp)));
+           //puttype (typeof (sp));
            oprintf ("%s", newline);
            break;
        default:
@@ -1176,7 +1175,6 @@ PRIVATE void put_reference P1 (SYM *, sp)
        }
        symbol_output (sp);
     }
-       */
 }

 /* align the following data */
@@ -1190,7 +1188,7 @@ static void put_align P1 (SIZE, al)
            break;
        case 2L:
        case 4L:
-           oprintf ("\tALIGN\t%d%s", (int) al, newline);
+           oprintf ("\talign\t%d%s", (int) al, newline);
            break;
        default:
            FATAL ((__FILE__, "put_align", "align == %ld", al));
@@ -1279,12 +1277,12 @@ PRIVATE void put_start P0 (void)

        // Add header info
        oprintf("\tCPU\t8086%s", newline);
-       oprintf("\tALIGN\t2%s", newline);
+       //oprintf("\tALIGN\t2%s", newline);
        //oprintf("\tORG\t100h%s%s", newline, newline);

        // Add instruction to jump to start of program
        //oprintf("\tsection .text%s", newline);
-       oprintf("\tjmp\t%smain\t; Jump to program start%s", external_prefix, newline);
+       //oprintf("\tjmp\t%smain\t; Jump to program start%s", external_prefix, newline);

        // Append file name to labels for unique names //
        // This will give labels the form: L_file_name_#

There are other changes required to the Makefile to compile on macOS, namely, removing -m32 (which may be required for Linux though):

diff --git a/assembler/Makefile b/assembler/Makefile
index 03fc92c..0678e95 100644
--- a/assembler/Makefile
+++ b/assembler/Makefile
@@ -30,7 +30,7 @@ mandir = $(prefix)/man
 .SUFFIXES: .c .o .h .mac .pl

 .c.o:
-       $(CC) -c $(CFLAGS) -m32 $<
+       $(CC) -c $(CFLAGS) $<

 NASM =         nasm.o nasmlib.o float.o insnsa.o assemble.o labels.o \
                parser.o outform.o output/outbin.o output/outaout.o \
@@ -50,10 +50,10 @@ NDISASM = ndisasm.o disasm.o sync.o nasmlib.o insnsd.o
 all: nasm ndisasm

 nasm: directory $(NASM)
-       $(CC) -m32 -o $(BIN_DIR)nasm $(NASMO)
+       $(CC) -o $(BIN_DIR)nasm $(NASMO)

 ndisasm: directory $(NDISASM)
-       $(CC) -m32 -o $(BIN_DIR)ndisasm $(NDISASM)
+       $(CC) -o $(BIN_DIR)ndisasm $(NDISASM)

bocke commented 4 days ago

It's not required on Linux either. Unless you are targeting Linux x86 (32-bit).

Just tested it, and it compiles (I applied your changes manually though, as I couldn't get the "patch" program to apply the diff).

rafael2k commented 4 days ago

Thanks. Applied.

rafael2k commented 4 days ago

Btw, I can not get running executable from the linker unless I add in the assembly created by c86 a section ".data" with something there. I'm investigating the issue.

ps: also, may be c86 output _main instead of main, as the ld86 likes _main as starting symbol.

rafael2k commented 4 days ago

I organized a bit the repository of the ELKS 8086-toolchain: https://github.com/rafael2k/8086-toolchain

Now with both ELKS tools (root directory) and a host-toolchain containing the original tools, for compilation in Linux or MacOS.

rafael2k commented 4 days ago

lemme know when there exists a fmemrealloc.

I thought you'd written one? I took your idea and came up with the following, which is what I'm thinking of adding to the ELKS OWC library so its always available:
void __far *fmemcpy(void __far *dest, const void __far *src, size_t count)
{
    char __far *d = dest;
    const char __far *s = src;

    while (count--)
        *d++ = *s++;
    return dest;
}

void __far *memrealloc(void __far *ptr, unsigned long size)
{
    void __far *new;

    if (!ptr)
        return fmemalloc(size);
    new = fmemalloc(size);
    if (!new)
        return NULL;            /* previous memory not freed */
    fmemcpy(new, ptr, size);    /* FIXME copies too much, will work on 8086 only */
    fmemfree(ptr);
    return new;
}
There won't likely be an actual fmemrelloc kernel implementation, as the kernel memory manager doesn't support that and doesn't need it. So this is likely the same semantics as what you wrote. I added an fmemcpy so that it works in other models, and note that since the old size isn't known, too many bytes are copied. This won't work in any protected modes, but won't cause any problems for us in real mode, garbage bytes will be copied into the newly realloced (unused) memory.

I will post a PR with this and fmemrealloc/fmemfree in it, unless it'll cause you problems.

This is good to have. Btw, that fmemcpy is just for fmemrealloc, right? Not that all memcpys with memory fmemalloc'ed needs to use it, right?

ghaerr commented 4 days ago

fmemcpy is just for fmemrealloc, right? Not that all memcpys with memory fmemalloc'ed needs to use it, right?

fmemcpy was added so that the fmem* routines could be used in non-large models, if wanted. The 8086 toolchain project is entirely in large model, so no, its not needed. In the large model (your) case, it doesn't make any difference whether fmemcpy or memcpy is used, as all pointers are far.

In my PR, fmemcpy will be separated out so that it can be used independently of fmemalloc etc. In general, fmemcpy need only be explicitly called when a pointer is explicitly defined as __far, otherwise the C compiler memory model automatically selects the proper memcpy which matches sizeof(char *).

ghaerr commented 4 days ago

I applied your changes manually though, as I couldn't get the "patch" program to apply the diff).

Did you use patch -p1 (or -p2)? That should take care of it.

bocke commented 4 days ago

I used -p1, but for some obscure reason the patch got rejected. :/

ghaerr commented 4 days ago

I can not get running executable from the linker unless I add in the assembly created by c86 a section ".data" with something there.

I have that fixed, C86 now emits section .text and section .data directives. In the BYU toolchain, all symbols were placed in the same segment. For ELKS, three segments (.text, .data and .bss) are handled specially. The linker folds all .bss into the data section, but the .bss section is not stored on disk.

I have also added .bss section support to C86, along with fixing local labels to be more readable, fixing -g's debug output to be a single rather than multiple lines, proper BSS support, separate code and data segments, removal of automatically emitting a call to main, and adding a _ prefix to all symbols. Things are looking pretty good for proper LD86 input to produce ELKS executables C86/NASM.

I have also sorted through clib.s and created a compatible c86lib.asm containing the compiler-emitted helper routines for 32-bit long support, etc. Ultimately, this will also have to contain the ELKS application startup code and possibly the jump to _main, depending on how the program entry symbol is passed to ld86.

I organized a bit the repository of the ELKS 8086-toolchain: Now with both ELKS tools (root directory) and a host-toolchain containing the original tools, for compilation in Linux or MacOS.

Sorry, IMO this isn't a great way to do this. The problem is, you now have two copies of the compiler and assembler source, which means that two copies have to be updated, patched, merged, etc for any bug fix or new enhancement. What I think we want to accomplish is to (automatically if possible) build two versions of the toolchain from the exact same source - one for ELKS, and one for the host. This then allows very fast development using the host-based tools (this is what I've been doing to quickly get the compiler mods working), as well as then, afterwards, the ability to have the ELKS binaries also present and guaranteed from the same source.

In order to do this, IMO a better way is to keep the original top level directories, directly as from upstream. This then allows for pulling down or pushing up changes easily. Then, one way to do it would be to leave the original Makefile's alone and used for the host build, and add new Makefile.elks files to build the ELKS binaries (these could be copies of Makefile setting CC=owcc etc and -DELKS=1 etc). The second set of Makefile.elks would be called from a new top-level Makefile using make -C compiler -f Makefile.elks as an example.

Top-level Makefile:

all: host elks

elks:
    make -C compiler -f Makefile.elks
    make -C assembler -f Makefile.elks

host:
    make -C compiler
    make -C assembler
... etc

For host vs ELKS changes made to any the tools (versus general changes for both), -DELKS=1 could be passed to the ELKS build and -DHOST=1 to the host build, and #if HOST or #if ELKS used in the source.

The developer workflow from the top level might be something like:

cd 8086-toolchain
make       # builds HOST and ELKS versions
-or-
make elks  # builds only ELKS version 
-or-
make host  # builds only host version

The binaries themselves could all go in bin, as bin/c86 (for host), bin/c86.os2 (for elks) etc, or have host/c86, elks/c86 etc. That part doesn't matter nearly as much. The *.os2 files could be copied and renamed somewhere as part of a make dist etc.

I can produce another patch but I'd like to keep my C86 changes tracked via git, so perhaps I should clone upstream and then apply changes, then have you pull them when desired? After this gets all straightened out, then PRs to your repo could be done directly.

What do you think?

ghaerr commented 4 days ago

I used -p1, but for some obscure reason the patch got rejected. :/

Post the results next time, perhaps you were in the compiler/ directory already and needed -p2. Or maybe a copy/paste selection error.

BTW thanks for your information on C86 history and finding the C86 manual! The manual in particular is very much helping me to figure out what C86 can do (.e.g the prefix problem above is solved by using c86 -prefix=_, from the manual).

bocke commented 4 days ago

I used -p1, but for some obscure reason the patch got rejected. :/

Post the results next time, perhaps you were in the compiler/ directory already and needed -p2. Or maybe a copy/paste selection error.

Likely format mistake (caused by faulty copy/paste). Maybe wrong end lines or something. Patch should have an option to ignore this, but as the change was small I didn't bother looking at the manual.

BTW thanks for your information on C86 history and finding the C86 manual! The manual in particular is very much helping me to figure out what C86 can do (.e.g the prefix problem above is solved by using c86 -prefix=_, from the manual).

No problem. :) I'm interested in stuff like this, so it doesn't bother me much. And I had a slow day at work, so I could procrastinate a bit.

I'm actually very impressed by what you are doing with ELKS currently. It feels like the development became more systematic and organized in last year or two.

rafael2k commented 3 days ago

I can not get running executable from the linker unless I add in the assembly created by c86 a section ".data" with something there.

I have that fixed, C86 now emits section .text and section .data directives. In the BYU toolchain, all symbols were placed in the same segment. For ELKS, three segments (.text, .data and .bss) are handled specially. The linker folds all .bss into the data section, but the .bss section is not stored on disk.

I have also added .bss section support to C86, along with fixing local labels to be more readable, fixing -g's debug output to be a single rather than multiple lines, proper BSS support, separate code and data segments, removal of automatically emitting a call to main, and adding a _ prefix to all symbols. Things are looking pretty good for proper LD86 input to produce ELKS executables C86/NASM.

I have also sorted through clib.s and created a compatible c86lib.asm containing the compiler-emitted helper routines for 32-bit long support, etc. Ultimately, this will also have to contain the ELKS application startup code and possibly the jump to _main, depending on how the program entry symbol is passed to ld86.

I organized a bit the repository of the ELKS 8086-toolchain: Now with both ELKS tools (root directory) and a host-toolchain containing the original tools, for compilation in Linux or MacOS.

Sorry, IMO this isn't a great way to do this. The problem is, you now have two copies of the compiler and assembler source, which means that two copies have to be updated, patched, merged, etc for any bug fix or new enhancement. What I think we want to accomplish is to (automatically if possible) build two versions of the toolchain from the exact same source - one for ELKS, and one for the host. This then allows very fast development using the host-based tools (this is what I've been doing to quickly get the compiler mods working), as well as then, afterwards, the ability to have the ELKS binaries also present and guaranteed from the same source.

In order to do this, IMO a better way is to keep the original top level directories, directly as from upstream. This then allows for pulling down or pushing up changes easily. Then, one way to do it would be to leave the original Makefile's alone and used for the host build, and add new Makefile.elks files to build the ELKS binaries (these could be copies of Makefile setting CC=owcc etc and -DELKS=1 etc). The second set of Makefile.elks would be called from a new top-level Makefile using make -C compiler -f Makefile.elks as an example.

Top-level Makefile:
all: host elks

elks:
    make -C compiler -f Makefile.elks
    make -C assembler -f Makefile.elks

host:
    make -C compiler
    make -C assembler
... etc
For host vs ELKS changes made to any the tools (versus general changes for both), -DELKS=1 could be passed to the ELKS build and -DHOST=1 to the host build, and #if HOST or #if ELKS used in the source.

The developer workflow from the top level might be something like:
cd 8086-toolchain
make       # builds HOST and ELKS versions
-or-
make elks  # builds only ELKS version 
-or-
make host  # builds only host version
The binaries themselves could all go in bin, as bin/c86 (for host), bin/c86.os2 (for elks) etc, or have host/c86, elks/c86 etc. That part doesn't matter nearly as much. The *.os2 files could be copied and renamed somewhere as part of a make dist etc.

I can produce another patch but I'd like to keep my C86 changes tracked via git, so perhaps I should clone upstream and then apply changes, then have you pull them when desired? After this gets all straightened out, then PRs to your repo could be done directly.

What do you think?

I like it. : )

Btw, I found Alan Cox' s tweaks to cpp which implements swapping to disk for supporting 8 bit systems with low memory: https://github.com/EtchedPixels/FUZIX/tree/master/Applications/cpp May be we want to merge his changes?

ghaerr commented 3 days ago

I found Alan Cox' s tweaks to cpp which implements swapping to disk for supporting 8 bit systems with low memory:

Interesting. I wasn't aware Fuzix was using the BCC tools. It looks like there's also ld09 which is a version of LD86.

I don't think we'll run into memory problems with our fmemalloc that can handle 256k+ bytes, but you never know. Good to know there's a low-memory version out there if we need it. How did you manage to find this out, deep in the Fuzix tree?

ghaerr / elks

Porting 8086-toolchain to ELKS #2112