Open ghaerr opened 5 days ago
I already looked all the Fuzix tree looking for useful stuff for ELKS. : )
Btw, I re-organized the repo. @ghaerr, I' m very interested in the changes to the compiler.
ps: still some little work to do, like saving the binaries with different names.
@rafael2k,
I re-organized the repo.
Thanks, I took a quick look and it'll be nice to be able to build both host and ELKS versions!
still some little work to do, like saving the binaries with different names.
Sounds good.
I' m very interested in the changes to the compiler.
I'm probably going to fork the upstream toolchain so I can track the C86 changes more closely. I may fork dev86 as well. That will allow me to see the changes being made so I know what's going on with the toolchain.
All this may take a bit, so here's all the C86 changes I've made so far, in a single diff. Hopefully it will patch right in, but it also contains the first C86 patch I posted above. Feel free to add it to your repo, we can sort out other changes later. c86.diff.zip
Some of the changes and suggestions for use are:
c86 -O
, as without the optimizer, the code generated is not very good.c86 -lang=c99
to get the latest enhancements, including the super-important //
comment handling.c86 -g
produces better "debug" output in the ASM file - showing C source lines, pretty neat.-m32
option is removed, check whether this builds under Linux (it is required to be removed for macOS).nasm -f as86
and link with ld86 to get an ELKS executable :) Let me know of any problems. I have prepared a c86lib.asm for the C86 long handling helpers but that's not quite done yet, so C source with long
might not link.c86 -v
that shows some debug info, but importantly the max memory used, which might be interesting to know when running on ELKS. Mine is saying 40k over here.int a;
in two different source files, that may not work yet. It appears that these would be equivalent to static int a
but placed in BSS, which is not what most people would expect.int a = 0;
in one file and char a = 0;
in another. It appears AS86 doesn't support that. We could use OBJ output and another linker which does.Have fun! :)
The -m32 option is removed, check whether this builds under Linux (it is required to be removed for macOS).
I already responded about this yesterday (or so). Linux doesn't need this either.
Unless you are cross-compiling.
Sure MacOS doesn't support it. MacOS dropped 32-bit support several years back. Even, on Linux it's considered half-obsolete with a lot of distros dropping 32-bit ports.
I guess "-m32" might came from some specific course requirement (if you used ECE??? repo as a base). Maybe they used 32-bit Linux in a Virtual Machine as VM would use less RAM memory that way? Not really needed, unless they used weak subnotebooks or chromebooks for the course.
I'm pretty sure the -m32 came from the days before x86_64 was standard, and the origins of the first 8086-toolchain are 7 years old. It was usually added so that sizeof(int) == sizeof(char *) and stuff like that. I just fixed the emulator/ problem and now it doesn't require -m32 either. It doesn't seem that C86 or NASM need it, since now both are running on ELKS and macOS!! It's all real small or real big these days :)
Linux got AMD64 support pretty early. In 2001. Wikipedia says it was the first OS with the official support.
It's possible "-m32" existed, but not on x86. :) As your experience is in embedded dev, it's not impossible that you worked with the platform that had both 32-bit and 64-bit version in the 90s. :) Maybe MIPS?
@rafael2k,
I re-organized the repo.
Thanks, I took a quick look and it'll be nice to be able to build both host and ELKS versions!
still some little work to do, like saving the binaries with different names.
Sounds good.
I' m very interested in the changes to the compiler.
I'm probably going to fork the upstream toolchain so I can track the C86 changes more closely. I may fork dev86 as well. That will allow me to see the changes being made so I know what's going on with the toolchain.
All this may take a bit, so here's all the C86 changes I've made so far, in a single diff. Hopefully it will patch right in, but it also contains the first C86 patch I posted above. Feel free to add it to your repo, we can sort out other changes later. c86.diff.zip
Some of the changes and suggestions for use are:
* Always run `c86 -O`, as without the optimizer, the code generated is not very good. * Use `c86 -lang=c99` to get the latest enhancements, including the super-important `//` comment handling. * Adding `c86 -g` produces better "debug" output in the ASM file - showing C source lines, pretty neat. * The `-m32` option is removed, check whether this builds under Linux (it is required to be removed for macOS). * This changes should allow you to run `nasm -f as86` and link with ld86 to get an ELKS executable :) Let me know of any problems. I have prepared a c86lib.asm for the C86 long handling helpers but that's not quite done yet, so C source with `long` might not link. * I added a `c86 -v` that shows some debug info, but importantly the max memory used, which might be interesting to know when running on ELKS. Mine is saying 40k over here. * All C symbols now are prefixed with _. Specially emitted calls to helper routines by C86 are not prefixed. * _main should be the entry point, but you may have to specify that to LD86, as we've not setup the C startup asm file yet. * Appropriate .text, .data and .bss sections should be emitted. There may be problems using `int a;` in two different source files, that may not work yet. It appears that these would be equivalent to `static int a` but placed in BSS, which is not what most people would expect. * C86 version number is incremented to 5.2pre for the time being. * NASM won't know about sizes of variables, in case of `int a = 0;` in one file and `char a = 0;` in another. It appears AS86 doesn't support that. We could use OBJ output and another linker which does.
Have fun! :)
Uff, now it is explained why I'm having some weird behavior with the executable after linking the AS86 objects. So this means we need to rule out AS86 object for now (and LD86)?
@ghaerr
There were -m386 and -m486, though. I think the supported for these options got dropped in versions of GCC after 2.7.x.
You might've though of that?
I'm having some weird behavior with the executable after linking the AS86 objects
What exactly is happening? I thought you showed that you had (initial) support for this work?
So this means we need to rule out AS86 object for now (and LD86)?
No, definitely not; I was just pointing out a somewhat obscure issue with what are known as C "common" variable declarations, which require special linker handling. I would suggest staying with NASM outputting AS86 .o format for now, and using LD86 to link into an ELKS executable.
I haven't seen the source ASM file you have been using for testing. One screenshot showed a portion of it - using INT 80 to perform a write
system call. Is that still working?
Hello @bocke,
The -m32 option is removed, check whether this builds under Linux (it is required to be removed for macOS). I already responded about this yesterday (or so). Linux doesn't need this either. It's possible "-m32" existed, but not on x86.
I have seen numerous desktop software packages, whose Makefiles used -m32, that came from the Linux environment and I was porting to macOS. I assumed the reason for -m32 was the program needed to run as a 32-bit executable, rather than 64-bit. This is usually the case when the program has unportable handling mixing ints and pointers. In macOS, the 32-bit libraries are no longer shipped, and later macOS versions won't even load a 32-bit executable.
I don't use Linux much now (preferring macOS), and had to remove -m32 from the C86 Makefiles in order to get it running on macOS. Luckily, that worked fine. After reading your comment about -m32 not be required on Linux anymore, I removed it permanently in my patch, rather than creating an ifdef LINUX, which was what I was previously planning. Good to know its not required anymore, and it seems our C86 project doesn't need it :)
Thank you!
I'm having some weird behavior with the executable after linking the AS86 objects
What exactly is happening? I thought you showed that you had (initial) support for this work?
So this means we need to rule out AS86 object for now (and LD86)?
No, definitely not; I was just pointing out a somewhat obscure issue with what are known as C "common" variable declarations, which require special linker handling. I would suggest staying with NASM outputting AS86 .o format for now, and using LD86 to link into an ELKS executable.
I haven't seen the source ASM file you have been using for testing. One screenshot showed a portion of it - using INT 80 to perform a
write
system call. Is that still working?
I' m probably doing something wrong, but I could not make the code generated from c86 call my print (char *, int) in assembly and linked together.
I get as output of the binary: ./teste invalid argument
I get as output of the binary: ./teste invalid argument
I'd need you to post both your C and ASM code (and build lines) in order for me to help. It is possible that the "invalid argument" is coming from the shell, rather than your binary, as well. You can use chmem ./teste
inside ELKS to see what the ELKS a.out header looks like, or disasm ./teste
to disassemble the ELKS executable.
I could not make the code generated from c86 call my print (char *, int) in assembly and linked together.
Also remember there is no C startup code at all, and main isn't automatically called, unless you specify that as the entry point to ld86. LD86 will otherwise likely set the entry point to CS:0, whatever that happens to be based on the LD86 command line .o link order.
If you can get back to having a working ASM-only file, perhaps then call _main
from that after pushing argv then argc and then see what happens.
Another very cool C compiler: https://github.com/alexfru/SmallerC
ld86 uses _main as entry point. At least it always ask for this symbol, giving an error if not present.
I could not make the code generated from c86 call my print (char *, int) in assembly and linked together.
Also remember there is no C startup code at all, and main isn't automatically called, unless you specify that as the entry point to ld86. LD86 will otherwise likely set the entry point to CS:0, whatever that happens to be based on the LD86 command line .o link order.
If you can get back to having a working ASM-only file, perhaps then
call _main
from that after pushing argv then argc and then see what happens.
I did not know about the entry point being always 0. Will try a bit more and later on I'll commit the file I'm using to provide the system calls.
ld86 uses _main as entry point. At least it always ask for this symbol
Ah, in that case it seems the entry point is forced to be _main, not 0. I haven't yet compiled up CPP and LD86 from your repo, did you finalize the process of allowing them to be compiled for both ELKS and host? I could then play with the process a bit more to help see the problems you're running into.
I'll commit the file I'm using to provide the system calls.
Nice - are you including all system calls, or generating each one specially? I have been working on a solution for that, it looks like this (untested):
section .text
callsys: ; common routine for ELKS system call
push bp
mov bp, sp
push si
push di
mov bx, [bp+4]
mov cx, [bp+6]
mov dx, [bp+8]
mov di, [bp+10]
mov si, [bp+12]
int 0x80
cmp ax, 0
jae L01 ; success
neg ax
mov [errno], ax
mov ax, -1
L01:
pop di
pop si
pop bp
ret
global _exit
global __exit
_exit: ; C exit temp comes here
__exit: ; _exit
mov ax, 1
jmp callsys
global _fork
_fork:
mov ax, 2
jmp callsys
global _read
_read:
mov ax, 3
jmp callsys
global _write
_write:
mov ax, 4
jmp callsys
... etc
ld86 uses _main as entry point. At least it always ask for this symbol
Ah, in that case it seems the entry point is forced to be _main, not 0. I haven't yet compiled up CPP and LD86 from your repo, did you finalize the process of allowing them to be compiled for both ELKS and host? I could then play with the process a bit more to help see the problems you're running into.
I'll commit the file I'm using to provide the system calls.
Nice - are you including all system calls, or generating each one specially? I have been working on a solution for that, it looks like this (untested):
section .text callsys: ; common routine for ELKS system call push bp mov bp, sp push si push di mov bx, [bp+4] mov cx, [bp+6] mov dx, [bp+8] mov di, [bp+10] mov si, [bp+12] int 0x80 cmp ax, 0 jae L01 ; success neg ax mov [errno], ax mov ax, -1 L01: pop di pop si pop bp ret global _exit global __exit _exit: ; C exit temp comes here __exit: ; _exit mov ax, 1 jmp callsys global _fork _fork: mov ax, 2 jmp callsys global _read _read: mov ax, 3 jmp callsys global _write _write: mov ax, 4 jmp callsys ... etc
Thanks, that is what I was starting to write.
And yes - the repo is (supposedly) ready to compile both ELKS and host (set the env vars for ELKS and watcom first). Lemme know any problem, and please advise on any way that could be better.
I just wrote a couple o syscalls for testing. I was thinking in porting parts of asmutils [1] for the system calls, but I still need to translate the stuff from 32 bits linux to 16 bits elks. But I'm not sure it is the best option.
So, now yes, I got a working executable!
I committed this example here: https://github.com/rafael2k/8086-toolchain/tree/dev/examples
I also did a Makefile for ELKS, just to realize before typing make.. we don't have make yet (please correct me if I'm wrong). : )
Wow, @rafael2k. You've got it all working :) Fantastic, this is truly a first!!! I can imagine the smile on your face when you saw with your own eyes for the first time what no one has done before - get a C compiler, assembler and linker running on ELKS to produce a working binary. Thank you for all your effort on this, well done!!!
I am increasingly liking the architecture where separate, smaller tools are used for each part of the process. This allows for later replacement with other tools without affecting the entire chain (e.g. using Smaller C, for instance), etc. I have been studying C86 in depth, and actually I believe you've picked a highly capable compiler, even though it apparently has some known bugs. There's quite a bit it'll do that we haven't even used yet. Even so, since we're using the idea of a C compiler that doesn't require a C preprocessor and produces ASM output, there are lots of options should we want to insert a different compiler, even for testing.
No, we don't have a make on ELKS yet. You're pretty good at finding great candidates, perhaps you can find one that looks good for porting! I know that OWC's wmake is a possibility, but having just looked into their assembler, the OWC make system itself is very complicated. I would think there would be some very small make renditions out there, and this will definitely help until we get the cc
compiler driver running.
Thanks for posting your examples/. I'll take a look at exactly how each tool is run and see if I can make any suggestions. Very cool!
I was thinking in porting parts of asmutils [1] for the system calls, but I still need to translate the stuff from 32 bits linux to 16 bits elks. But I'm not sure it is the best option.
I just looked at this, and although its a great resource with tons of assembly, its all 32-bit. Things can bite pretty fast when a 32-bit instruction is used that's not present on the 8086, as well as lots of work anyways.
I have completed a 'syscall.asm' for ELKS that has all ELKS system calls; I'll clean it up a bit and post it so you can look at it. It is a continuation of examples/clib.s. I'm glad to see that my (untested) syscall routine actually worked!
I have also put together a 'c86lib.asm' which implements all of the C86-called helper routines used for 32-bit arithmetic. This was ported from upstream clib.s, but I've also implemented alloca
and stack-checking support which is also added. I'll clean that up and post that too for you. The combination of this and syscall.asm should allow quite a few C programs to be tested with your toolchain.
On another note: I think it would be better to use .asm, rather than .s, for the extension for NASM source files. This is because later we can produce automated make rules that will invoke NASM (instead of GAS) for .asm. The .s and .S extensions are pretty much used only for gcc AS input, and would allow both to exist, should we want them. (Note that C86 has an option to produce GAS output!)
Also, for your request to support v0 a.out in blink16 (answering here for now, will also there): Since v0 a.out is pretty much dead, I'm thinking that of producing an LD86 patch that produces v1 a.out, how about that instead? That would seem to solve two problems at once.
Thank you!
Thanks for the kind words @ghaerr ! I'm learning a lot in the process, having last used assembly 2 decades ago when doing my CS bachelor, and the professor insisted we needed to learn memory segmentation in the 8086. : )
v1 a.out in LD86 is better, easy pick! This is useful for everybody running dev86 tools (on Debian based Linuxes, which ship dev86 packages inclusive).
I committed this example here: https://github.com/rafael2k/8086-toolchain/tree/dev/examples
I also did a Makefile for ELKS, just to realize before typing make.. we don't have make yet (please correct me if I'm wrong). : )
Here you can find some older C implementations of make (some of them are single file implementations that might compile with c68 directly): ftp://69.43.38.172/mirrors/ftp.coast.net/msdos/c/
I think this is the complete list: abmake14.zip, cmake100.zip, dmake38s.zip, gymake12.zip, hymake31.zip, maek.c, make-pd.zip, make.c(+make.h), smake155.zip.
Most of these (all?) are portable and should work on Unix-ish OSes with minimal changes.
It's possible something more modern can be ported with gcc-ia16. If it's small enough.
ACK make ported!
I committed this example here: https://github.com/rafael2k/8086-toolchain/tree/dev/examples I also did a Makefile for ELKS, just to realize before typing make.. we don't have make yet (please correct me if I'm wrong). : )
Here you can find some older C implementations of make (some of them are single file implementations that might compile with c68 directly): ftp://69.43.38.172/mirrors/ftp.coast.net/msdos/c/
I think this is the complete list: abmake14.zip, cmake100.zip, dmake38s.zip, gymake12.zip, hymake31.zip, maek.c, make-pd.zip, make.c(+make.h), smake155.zip.
Most of these (all?) are portable and should work on Unix-ish OSes with minimal changes.
It's possible something more modern can be ported with gcc-ia16. If it's small enough.
ps: used ia16-gcc inded. Code here: https://github.com/rafael2k/8086-toolchain/tree/dev/make
Got from here: https://github.com/davidgiven/ack/tree/default/util/make
I reduced some block sizes in nasm, which I think it is ok, but did not investigate deep the side effects, apart of good memory savings: https://github.com/rafael2k/8086-toolchain/commit/897ed6ab9ddb3585c16b34e567e2d2447c223930 and at least for small tests, all good. Now make can call system(nasm) without nasm running out of memory.
Well that was fast getting a make identified and running! :)
I have a couple patches which fix some errors in various Makefiles, I'll submit those in a separate PR.
I reduced some block sizes in nasm Now make can call system(nasm) without nasm running out of memory.
It seems from the screenshot above that the owcc link line is specifying a 32k heap size, which I am assuming is possibly the same in all the ELKS builds. You will probably have to consider lowering this on certain programs, especially make or any other program that runs concurrently with other tools, so that you don't run out of system memory. Some tools may want to run with a very large heap.
The ELKS elks/tools/bin/ewlink
script has --stack
and --heap
options that allows you to easily specify a stack or heap size to owcc/wlink. Actually, I would recommend that you use the ewcc and ewlink, rather than owcc directly in the ELKS tool builds. This allows ELKS to set new options and not have to change all these Makefile.elks. If the ELKS path is current, then ewcc/ewlink are automatically in the path, otherwise the scripts (they are shell scripts) might have to be copied.
This isn't a big deal now, but just wanted to let you know about it. I haven't compared the many owcc/wlink options used in this project with the ones that ELKS wants, but I would think many are the same. Also, the ewcc/ewlink allows you to pass options directly to wcc or wlink by specifying them on the command line, so it is easy to add options you might specially need (although I'd like to hear about those, as part of getting ELKS OWC support up to speed for outside development). The details for this parameter passing are documented in ewcc/ewlink.
Indeed!
I tuned the heap size for each tool, until I got them running stable. I copied the compilation parameters from ewlink and ewcc. But due to make idiosyncrasies and how the scripts are written, I could not directly use ewlink and ewcc (I tried).
I submitted both the C86 compiler helper routines and a complete ELKS system call library in https://github.com/rafael2k/8086-toolchain/pull/2. Thank you.
Thanks! Already merged. I'll test and give feedback.
Added ar from dev86. It compiled fine (ia16-gcc), but I got one scary warning of cast of int and pointer of different sizes.
ar.c: In function ‘update_symdefs’:
ar.c:1877:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
(void*)(pos - old_strings_size));
^
I got one scary warning of cast of int and pointer of different sizes.
Good news this is only in a fatal error message. It is trying to display the difference of two pointers as an integer using %u. You might try changing the last line to (int)(pos - old_strings_size));
.
Trying to run on bare metal here to see how it goes.
Trying to run on bare metal here to see how it goes.
I'm not following you... how/what exactly have you prepared your boot disk? Code '3' above means no system image (/linux) found.
Trying to run on bare metal here to see how it goes.
I'm not following you... how/what exactly have you prepared your boot disk? Code '3' above means no system image (/linux) found.
I just tried to "dd" the floppy image and then a hd image to a USB pendrive and tried to boot on my Thinkpad. Just for fun.
I did not test with big inputs, but ar is working on ELKS:
ar is working on ELKS:
This is fantastic! Wonderful progress, we now have a full C toolchain able to run on ELKS, when two weeks ago nothing. Amazing and very cool :)
I'm sure there are a number of problems with memory usage, max size of programs, etc. And of course now one "needs" a C library to go along with it all. However, I would suggest that instead of jumping into the boatload of work getting all that done (and I have many ideas for that), now might be a good time to test how well the toolchain can compile, assemble and link various C source (and not worry about undefined symbols). There are a number of known issues with C86 which need to be seen, and any number of issues with the other tools could easily surface. Now would be the time to find them out, before getting further married to each specific tool.
The tools were chosen rather quickly, and IMO they look pretty good. But we don't really know the extent of how well they'll run with C source in the wild, so to speak. For now, the ELKS C library header files can be referenced for any include files, and trying to port any of the moderately sized, more portable programs from elkscmd/ or elsewhere will show us where we're at regarding usability.
I plan on testing these suggestions myself, as well as thinking about better ways of exporting the ELKS C library (both GCC and OWC versions).
Nice work @rafael2k!
Yay! I'm happy with the advances of the last two weeks. Thanks for all the support!
I agree with you, now time to test. I'll do a floppy image so those who want just to test the toolchain, can do it easily without having to compile.
This is a continuation of the discussion in https://github.com/ghaerr/elks/issues/1443#issuecomment-2489091235, regarding issues getting what is hopefully the latest version of a C86 compiler and @rafael's port of its included (older) NASM assembler running on ELKS.
At the moment, there is some consideration of using Dev86's CPP C preprocessor, producing Dev86-compatible AS86 format object file out from NASM, and possibly using Dev86's LD linker, as both CPP and LD are (hopefully) likely to be easily ported to the ELKS 8086-only environment.
I'm not sure where the best current sources are for Dev86 - it used to be that @jbruchon hosted them on Github, and that versions' upstream is quite old, but still present: https://github.com/lkundrak/dev86. It seems that jbruchon has moved his version to Codeberg at https://codeberg.org/jbruchon/dev86. During the last four years, I am aware of a number of bug fixes posted to his repo when it used to be on Github. I would recommend starting with jbruchon's Dev86 unless another more updated version is found on Github.
ELKS shares quite a history with Dev86, just five years ago the entire kernel and C library were compiled using its BCC->AS86->LD toolchain. The ELKS C library had originally bin in dev86/libc but had been moved prior to that.
While it could make sense to use Dev86's CPP and LD in order to get C86 running more quickly on ELKS, unfortunately the BCC compiler is K&R only, and doesn't support ANSI C at all.
@rafael2k, which repo are using for your CPP and future LD ports? I would assume that if you can get them running, both will be moved into your https://github.com/rafael2k/8086-toolchain repo.