ghaerr / elks

Embeddable Linux Kernel Subset - Linux for 8086
Other
1k stars 108 forks source link

Games development on ELKS #871

Open toncho11 opened 3 years ago

toncho11 commented 3 years ago

Let's take this example of Snake compiled by Turbo C:

https://github.com/ashmew2/snake-game/blob/master/Final.cpp

@ghaerr Maybe graphics.h can be mapped to Nano-X? But on 8086 computers Naxo-X looks slow for games. For example

ghaerr commented 4 months ago

my poor 8088 struggles with this Mandelbrot set. It is on the level of the first slow sl train.

The reasons for the slowness might be different, as mand.c doesn't use any cursor positioning between output as sl does. I did notice that it uses a slow single-byte printf:

printf("%c", (iter >= maxIter) ? ' ' : shades[(int)(iter/gamma)]);

If this were converted to putchar that might help, but perhaps the slowness is the soft floating point in general... ?

I played with the asm Mandelbrot and proper ELKS exit procedure. The most remarkable result so far is that this program has managed to crash martpc emulator.

That is kind of amazing crashing the emulator. If you want to post your exit code, I'll take a look at it. Could possibly be something to do with unmapped memory (see below).

I noticed it employs an unusual memory layout:

In general, you'll want at least SS = DS = same startup DS value, and not reset SS or DS into the code segment, as ELKS manages the text (CS) segment very differently than DOS. So, make sure you're not resetting SS, DS or SP as I've described above.

It uses two big tables which do not fit into one 64KB segment. The iters table sits at DS:0 (plus some other stuff), and squareTable of size 0x8000 is at ES:0. How can this be accommodated by ELKS ELF?

In general, the ELF .o format doesn't concern itself too much with individual DS or ES access to a 64k segment, but rather just groups together various "sections" and then puts their length into the output executable binary header. That header describes the min (and sometimes max) amount of space to allocate for the text (CS) and data/bss/heap (DS) segments. Whether ES points to something outside of that is not the linkers concern.

Having said that, ELKS will only allocate space for a 4K heap (directly above the end of data+bss) and then 4K stack (directly above that) by default. What is very likely happening (I haven't seen your link/build instructions or Makefile) is that the executable is being loaded by allocating the correct amount for text/CS, but then only 8K max for heap/stack, which follows the end of data. I am guessing but haven't changed your source that the first (DS) section may be being allocated in bss instructions in the source, but the SS/SP and ES portions are not being allocated. This would have the effect of the program writing outside its allocated boundaries when scribbling with ES:offset.

The fix for the above is a little complicated, as ELKS will only allow for a max 64k data+bss+heap+stack to be allocated. If more is needed, a special fmemalloc system call will be required and then ES can use that return pointer for other memory.

IIRC your program asks DOS for all available memory right on start, then doesn't worry about anything. Depending on exactly where the program is loaded (run meminfo to check from another terminal), the program may or may not be trashing memory beyond the heap (likely into our stack, unless you haven't reset SS/SP). Another possible quick fix would be to use the -maout-heap=0xffff linker option to set a max data+bss+heap+stack of 64k, which gives a bit more working room but will still fail from what you're describing the memory layout being used.

Next step to properly get your program working, other than DS/SS/SP changes described above would be to chase down the system call number for fmemalloc, and call that to get far memory for the ES section. More details is in libc/malloc/fmalloc.c.

I hope this helps.... you kind of asked for trouble when you chose a game in assembly for your first ELKS programming project!!

Also I tried blink16 for debugging, however it seems to be more strict than the ELKS itself. It stops running the program at a stage of setting up DS, ES, SS registers. Did I misconfigure it somehow?

I'd probably have to see a screen shot or video to determine exactly why its stopping... blink16 is pretty complicated and I don't remember right now all the reasons it might not run properly. One of them is that blink16 emulates both IBM hardware, used to run ELKS kernel, and the other mode is running ELKS executables. In that mode it emulates the 8086 instruction set and ELKS system calls, but not IBM PC. There is the possibility that an ELKS system call is not implemented, and that is why it is stopping. I didn't have time to add every system call as the focus was to get the kernel debugging working, which was completed.

toncho11 commented 4 months ago

@ghaerr, what about extending blink16 to a DOS emulator. Not simple, but the idea is to replace DOS system calls with ELKS system calls on the fly. We could start with a proof a concept where a small DOS application that uses a few DOS system calls is used. It will be useful both applications and games.

tyama501 commented 4 months ago

Hi @toncho11 I could built 8086tiny emulator with no-graphic option if I reduce memory like from 1MByte to 4KB. The program only has 23KB or so but I think the biggest problem is that we can't allocate full memory to work some kind of emulator if the host is in real mode. https://github.com/adriancable/8086tiny

ghaerr commented 4 months ago

@toncho11:

what about extending blink16 to a DOS emulator.

Blink16 runs only on 32- or 64-bit systems, it cannot be ported to ELKS; for instance it allocates a 1M static array from which it emulates a 1MB PC address space.

I have another project, sim16 which will run non-graphical DOS programs by emulating DOS system calls and an 8086 CPU on a host system. No, it doesn't have the CGA graphics support that blink16 does, but that could be added by pulling over all the DOS system calls from sim86 into blink16. It would, however, be extremely slow in graphics emulation and would still only run on a Linux or macOS host using codepage 437 character output to simulate CGA. The primary advantage of blink16 is to show onscreen CPU step-by-step instructions being executed in a window, with a much smaller window for text or CGA output.

If you're wanting to run DOS programs on Linux, there are already many other emulators that do a great job, like DosBox-X.

If what you're talking about is emulating DOS system calls on ELKS itself, well, lets say that would be paramount to porting FreeDOS to ELKS, which would be a huge job, as the 16-bit ELKS OS isn't easily setup to run 8086 VM or whole-memory applications along with other ELKS applications. All the DOS emulators that I know of run in 32- or 64-bit flat mode and allocate 1M - 16M of RAM from which they emulate a DOS system, all within a protected OS process.

ghaerr commented 4 months ago

I could built 8086tiny emulator with no-graphic option if I reduce memory like from 1MByte to 4KB. The program only has 23KB or so

8086tiny is a fantastic emulator - it does a great job in very little space emulating an 8086 CPU along with most IBM PC basic hardware. However, it assumes a 1M address space and has no protection from programs grabbing or writing whatever RAM they might want. In order to run DOS games on 8086tiny, one would first boot DOS on the emulator itself (which then emulates an 8086 PC) and then have DOS run the graphics game on top of that. That can't be made to work when an ELKS kernel is also taking control of the PC hardware. (Note also BTW that 8086tiny requires using its full special replacement BIOS ROM image which is also loaded into the virtual address space, where special handling is done to trap BIOS console I/O and redirect it to Linux file descriptors, and tricky things like that. This allows the actually 8086 emulator to not concern itself with anything but PC hardware and have its special BIOS handle "outside" I/O).

toncho11 commented 4 months ago

Yes, I got the wrong impression that blink16 runs on ELKS. My idea was to identify on the fly DOS specific API calls and redirect to ELKS system calls. Then I suppose one needs to put the result where the DOS program expect it (on the stack for example) and then continue reading the code and executing it until the next DOS system call. Some stuff such as direct BIOS calls can be ignored in the beginning or we can make a special ELKS system call that I once suggested to handle these.

It is a huge project, no doubt about it, if even possible. The advantage later will be that such a software that I suggest could be eventually adapted to create a new ELKS image from a DOS program that will be used forever and thus will speed up execution dramatically. It is like when a virtual machine makes a native image of a Java or .NET application.

So the DOS application is interpreted as a byte code that needs to be converted to a native (ELKS) code and executed.

When doing it on the fly and in the beginning some system calls that are related to writing to disk and sound can be ignored.

toncho11 commented 4 months ago

Yes, I got the wrong impression that blink16 runs on ELKS. My idea was to identify on the fly DOS specific API calls and redirect to ELKS system calls. Then I suppose one needs to put the result where the DOS program expect it (on the stack for example) and then continue reading the code and executing it until the next DOS system call. Some stuff such as direct BIOS calls can be ignored in the beginning or we can make a special ELKS system call that I once suggested to handle these.

It is a huge project, no doubt about it, if even possible. The advantage will be that such a software that I suggest could be run once to create a new ELKS image from a DOS program that will be used forever and thus will speed up execution dramatically. It is like when a virtual machine makes a native image of a Java or .NET application.

So the DOS application is interpreted as a byte code that needs to be converted to a native code and executed.

When doing it on the fly and in the beginning some system calls that are related to writing to disk and sound can be ignored.

So the first operation is disassembling and following the code. And then memory will be allocated by this VM that I propose running on ELKS and then every access from the DOS application will be mapped to the memory allocated by the ELKS kernel.

So it will work only for an application that starts and sums two numbers and prints the result?

toncho11 commented 4 months ago

Yes, I got the wrong impression that blink16 runs on ELKS. My idea was to identify on the fly DOS specific API calls and redirect to ELKS system calls. Then I suppose one needs to put the result where the DOS program expect it (on the stack for example) and then continue reading the code and executing it until the next DOS system call. Some stuff such as direct BIOS calls can be ignored in the beginning or we can make a special ELKS system call that I once suggested to handle these.

It is a huge project, no doubt about it, if even possible. The advantage will be that such a software that I suggest could be run once to create a new ELKS image from a DOS program that will be used forever and thus will speed up execution dramatically. It is like when a virtual machine makes a native image of a Java or .NET application.

So the DOS application is interpreted as a byte code that needs to be converted to a native code and executed.

When doing it on the fly and in the beginning some system calls that are related to writing to disk and sound can be ignored.

I am talking about DOS INT 21h services.

Vutshi commented 4 months ago

Hi @ghaerr,

I found the syscall number of fmemalloc to be 206 and that it takes 2 arguments. I guess the first argument is the number 206 and the second one is size. I checked how this is done in DOS. ChatGPT has kindly provided the following example:

.code
main proc
    mov ax, @data
    mov ds, ax

    ; Set up for DOS memory allocation
    mov ah, 48h            ; Function 48h: Allocate memory
    mov bx, AllocSize      ; BX = number of paragraphs to allocate (16 bytes each)
    int 21h                ; Call DOS interrupt

    ; Check for errors
    jc AllocationFailed    ; If CF (carry flag) is set, allocation failed

    ; Allocation succeeded
    mov AllocSegment, ax   ; AX contains the segment address of the allocated block
    ; Do something with the allocated memory

    ; Free the allocated memory before exiting
    mov ah, 49h            ; Function 49h: Free memory
    mov es, AllocSegment   ; ES = segment address of the block to free
    int 21h                ; Call DOS interrupt

    ; Exit program
    mov ax, 4C00h          ; Function 4Ch: Terminate process
    int 21h                ; Call DOS interrupt

AllocationFailed:
    ; Handle allocation failure
    ; For simplicity, just exit
    mov ax, 4C00h
    int 21h

main endp
end main

In ELKS the allocation size seems to be in bytes (correct?). Do I need to free the memory after use and take care of allocation errors?

Best

ghaerr commented 4 months ago

I found the syscall number of fmemalloc to be 206 and that it takes 2 arguments. I guess the first argument is the number 206 and the second one is size.

Almost. Search for fmemalloc in libc/malloc/fmemalloc.c and you'll see:

/* request paras from main memory, returns segment */
int _fmemalloc(int paras, unsigned short *pseg);

/* alloc from main memory */
void __far *fmemalloc(unsigned long size)
{
    unsigned short seg;
    unsigned int paras = (unsigned int)((size + 15) >> 4);

    if (_fmemalloc(paras, &seg))
        return 0;
    return _MK_FP(seg, 0);
}

So fmemalloc is a C wrapper around _fmemalloc which is the actual system call. That's because syscall.dat shows

fmemalloc   +206    2   *

where the * means

#   '*' = Needs libc code (Prefix _)

So the actual system call is named _fmemalloc.

Since system calls take their arguments in the order as

ELKS system calls requires registers set in order of parameters:
    BX, CX, DX, DI, SI

then AX=206, BX=paras and CX=pseg, where:

paras is the number of 16-byte paragraphs of memory wanted pseg is an address in your data segment where the segment value of the far memory allocated is returned.

After the system call, AX will be the negative error number or 0 if no error. So, to allocate 32k bytes in far memory, something like this should work:

  mov ax,206
  mov bx,32768/16
  mov cx,farmem
  test ax, ax
  jnz error
  ; memory is allocated, segment to use is in farmem
 mov ax,farmem
 mov es,ax
  ...
  section .data
farmem: dw 0

I checked how this is done in DOS.

The way DOS does things is entirely different than ELKS/Linux.

Do I need to free the memory after use

No, it's automatically freed after program exit. You can check for errors if you like, failure for a 32k allocation is fairly small.

Vutshi commented 4 months ago

Hi @ghaerr,

Mischief managed!

https://github.com/ghaerr/elks/assets/4971779/a83e7534-6dc6-4dd4-aa47-89b3876ca7e5

Here is the code and binary for testing: code&binary.zip I assemble it as follows:

nasm -o mand_elf_v2.o mandel_clean_elks_v2.asm -f elf
./ia16-elf-gcc -melks-libc -mcmodel=small -nostdlib mand_elf_v2.o -o mand_el2 -maout-heap=0xEFFF -maout-stack=0x0100

Next I want to rewrite it in GAS. Is there a good tool for automatic conversion? I tried the first result on Google, but it doesn't really worked well.

Best

Vutshi commented 4 months ago

That is kind of amazing crashing the emulator. If you want to post your exit code, I'll take a look at it. Could possibly be something to do with unmapped memory (see below).

You were right. It messed up memory outside of the allowed space. Actually, yesterday I managed not just to crash the emulator but halt the macOS twice. Even in 2024, reset button is still the user's best friend.

Best

Vutshi commented 4 months ago

Damn it! There's a drawing bug now—two stripes in the middle.

EDIT: QEMU works fine. MartyPC shows the stripes. I wonder what real hardware does. EDIT_2: QEMU also shows them randomly from time to time.

ghaerr commented 4 months ago

Hello @Vutshi,

Mischief managed!

I am impressed. Well done. Did you add the read(0,...) at the end or is it automatically exiting without waiting for a keystroke?

I haven't looked at your source yet, but would suggest that you use a stack of at least 1K, rather than 256 bytes. Of course, this would be in the case you have removed the older originally static location within the data segment for SS:SP.

There's a drawing bug now—two stripes in the middle.

I'm not seeing that in your video... is the video from QEMU? Initially I wondered if this was from a possible overlap of data and/or stack segment within section .data. Not sure. Or perhaps that's the way the original program works.

QEMU also shows them randomly from time to time.

Have you looked hard at the innards of the draw code? I'm wondering if there is a wait for vertical retrace in there that isn't working properly or emulated properly.

It messed up memory outside of the allowed space. Actually, yesterday I managed not just to crash the emulator but halt the macOS twice.

That is kind of amazing... MartyPC is written in Rust and that's supposed to be extremely memory safe. And how a process can bring down macOS is a bit confounding, unless somehow the emulator is somehow managing to allocate extreme amounts of VM/physical memory, much beyond the mere 1MB being used for 8086 emulation.

Is there a good tool for automatic conversion?

I found one I used for a bit in the old days when ELKS was being converted from bcc to gcc. The entire set of ELKS .s files had to be converted. Let me we if I can find that one.

[EDIT: Looking at your code, I find some problems pointed out below:]

  ; Set up segments and stack
  ; ELKS sets up DS = SS = ES
  ; push ds
  mov ax,cs
  mov ds,ax // <--- can't do this, this sets DS == CS. DS must be left alone as it is set by ELKS for section .data
  ;add ax,(codeEndInitTemp + 15) >> 4
  ;mov es,ax // <--- not setting ES properly removes valid destination for rep mows below, error.
  mov cx,18
  mov si,colourTableInit
  mov di,colourTable
  rep movsw             ; Want iters array to be at DS:0 so copy the colour table past it
  ;this is what it does rep movsw %ds:(%si),%es:(%di)
  ;mov ds,ax // <--- following code could be messed up. Suggest always using DS for static data and ES for fmemalloc data
  ;pop ds
  mov ax,es
  mov ds,ax             ; restore DS
;  add ax,(codeEnd + 15) >> 4
;  mov es,ax
;  mov [squareTableSegment],ax

  ; Allocate space for squareTable
  mov ax,206
  mov bx,0x800 ; allocate 0x800*0x10 = 32KB
  mov cx,squareTableSegment
  int 0x80
  test ax, ax
  ;jnz exit
  ; memory is allocated, segment to use is in squareTableSegment
  mov ax,[squareTableSegment]
  mov es,ax
  push ds
  mov ds,[squareTableSegment]
  mov cx,32
  mov [cs:savedSP],sp
  mov sp,0x1c00 // <--- can't do this, this moves SP into fix-numbered area with DS/SS segment as written. Must use area within DS heap (use label in static .data section

The problems above could very easily generate some improper graphical output if/when the DS/ES and SS:SP regions overlap. The problem of exactly why SP is being reset I don't understand yet from the source,. It would be best to not reset SP at all and use a larger stack area if possible. In ELKS, DS = SS and SP is default set high in DS/SS segment with data, bss and heap at low addresses.

Vutshi commented 4 months ago

The problem of exactly why SP is being reset I don't understand yet from the source

it is reset because reenigne uses every register for computation. If I remember correctly here SP is being used to check that x^2+y^2<4.

Vutshi commented 4 months ago

mov ds,ax // <--- can't do this, this sets DS == CS. DS must be left alone as it is set by ELKS for section .data

Here the idea is to prepare data segment (or section?) in such a way that at DS:0 we have a large temporary uninitialized array used later for calculations. This is why reenigne copies array colourTableInit ( stored at the end of code segment) to a position DS:xxxx

ghaerr commented 4 months ago

it is reset because reenigne uses every register for computation.

In that case, SP could be reset to a hard number like 0x1c00 as long as you have either made space for it in the .data section by looking at the map file (as ORG statements won't work correctly) or are sure that that address is within the heap area below the 42K data area being used. Still, DS should not be reset to the .text/CS segment which allows DS == SS for the duration of the execution.

The following code in elks/arch/i86/kernel/irqtab.S is used to check that SS == DS during hardware interrupts or syscalls. This is only checked current when CONFIG_TRACE (default OFF) is on:

utask:
        mov     current,%si
#ifdef CHECK_SS
//
//      We were in user mode, first confirm
//
        mov     %ss,%di
        cmp     TASK_USER_SS(%si),%di // entry SS = current->t_regs.ss?
        je      utask1          // User using the right stack
//
//      System got crazy
//
        mov     $pmsg,%ax
        push    %ax
        call    panic
utask1:
#endif

Here the idea is to prepare data segment (or section?) in such a way that at DS:0 we have a large temporary uninitialized array used later for calculations.

Yes that should work - the kernel only cares that the .data (and stack) segment are that which is set on program entry in DS and that the program doesn't use memory outside its allocated segment, since there is no HW memory management.

Vutshi commented 4 months ago

Does ELKS prepare all three registers like this SS = DS = ES = data segment? At least this is what I see happening inside martypc debugger.

ghaerr commented 4 months ago

Almost. SS = DS = data segment. The C startup and/or kernel may be setting ES as well, but that isn't actually necessary, ES is a free segment register and can be changed/used for what you'd like. It is required to be used as a destination register for various 8086 string instructions, for example. The C compiler requires that ES be saved between function calls, but in ASM you can do what you want.

ghaerr commented 4 months ago

You can run chmem on your program to see what the default values are that the kernel uses to calculate the sizes of the .text and .data (+heap+stack) segments. Zero values displayed for heap or stack will default to 4K each.

Vutshi commented 4 months ago

In that case, SP could be reset to a hard number like 0x1c00 as long as you have either made space for it in the .data section

this is very illuminating. Thank you. There was a magic constant 0x1c40 used to initialize stack and which I didn’t understand. Now it becomes clear why a tiny stack of size 0x80 is defined with such a large shift.

Vutshi commented 4 months ago

Still, DS should not be reset to the .text/CS segment

Is there a method to copy a piece of the .text/CS segment (array colourTableInit) to the .data/DS segment without temporary resetting DS to CS?

ghaerr commented 4 months ago

Is there a method to copy a piece of the .text/CS segment (array colourTableInit) to the .data/DS segment without temporary resetting DS to CS?

Not easily, but now that I understand what is happening, its OK to temporarily reset DS. This is done by certain far C library function where the source and destination are in different segments. DS is also reloaded by the C compiler when a __far pointer is used. And the kernel only checks that SS is proper, not DS so you should be OK, as long as the program is keeping track of what the original DS is/was (usually by pushing it on the stack, then popping afterwards).

Vutshi commented 4 months ago

You can run chmem on your program to see what the default values are that the kernel uses to calculate the sizes of the .text and .data (+heap+stack) segments.

I did this and realized that default 4K+4K is too small. So now I set it manually: ./ia16-elf-gcc -melks-libc -mcmodel=small -nostdlib mand_elf_v2.o -o mand_el2 -maout-heap=0xEFFF -maout-stack=0x0100

ghaerr commented 4 months ago

I did this and realized that default 4K+4K is too small. So now I set it manually:

Unfortunately, that won't work well. -maout-heap=0xFFFF is a special marker which says allocate all you've got, versus a fixed amount, which, like 0xEFFF won't run the program if not available. The stack doesn't need to be specified separately as it is part of the max data area you're asking for with 0xFFFF. Plus you reset SP anyways. I advise using -maout-heap=0xFFFF and leave out any stack setting.

Now that I think of it, there are some other kernel checks (e.g. when calling brk or sbrk) and/or when kernel stack checking is turned on: these will complain when SP is lower than the max SP size allocated at the top of the data segment. So when you set SP very low, this happens. Lets not worry about that for now, as I don't think that is happening, although you won't be able to see it since the console is in graphics mode! :)

Vutshi commented 4 months ago

MartyPC is written in Rust and that's supposed to be extremely memory safe. And how a process can bring down macOS is a bit confounding.

I am an (un)lucky owner of the last Intel based MacBook Pro. On the one hand, it is very interesting from the computer archeology point of view. On the other hand, Apple seems to totally lost interest in fixing bugs for my hardware. It just crashes once in a while.

Vutshi commented 4 months ago

I found one I used for a bit in the old days when ELKS was being converted from bcc to gcc. The entire set of ELKS .s files had to be converted. Let me we if I can find that one.

That would be awesome.

Vutshi commented 4 months ago

Unfortunately, that won't work well. -maout-heap=0xFFFF is a special marker which says allocate all you've got, versus a fixed amount, which, like 0xEFFF won't run the program if not available. The stack doesn't need to be specified separately as it is part of the max data area you're asking for with 0xFFFF. Plus you reset SP anyways. I advise using -maout-heap=0xFFFF and leave out any stack setting.

I hope that after conversion to GAS assembler the compiler will be able to figure out the size required on its own. Everything is precisely and explicitly defined in the data section after all.

ghaerr commented 4 months ago

Everything is precisely and explicitly defined in the data section after all.

If the 42K data area and stack are already defined statically in the .data segment, then no extra heap is required. The stack could be set small as you had since apparently SP is reset anyways. We'll still have the same issues with certain kernel checks and still ignore them for now.

I hope that after conversion to GAS assembler the compiler will be able to figure out the size required on its own

GAS conversion won't do anything different, a perfect conversion would output an almost identical .o file, with the exception of section names might be different. And even if a C compiler were used, our ia16-elf-gcc doesn't do any stack requirement calculations. Currently, the only way to actually learn stack and heap requirements are either manually through source code inspection or running the new CONFIG_TRACE with strace set, which reports stack usage on each system call. Since your program doesn't really make any system calls after init, that probably doesn't do much good either.

ghaerr commented 4 months ago

I found one I used for a bit in the old days when ELKS was being converted from bcc to gcc. The entire set of ELKS .s files had to be converted. Let me we if I can find that one.

That would be awesome.

I don't know where the script I used went, but have you looked into these two: they seem they could work well or have a larger following:

Ubuntu Intel2Gas https://manpages.ubuntu.com/manpages/jammy/man1/intel2gas.1.html https://launchpad.net/ubuntu/+source/intel2gas/1.3.3-19

Ta2as: https://github.com/mefistotelis/ta2as

Let me know how/if these work for you.

Vutshi commented 4 months ago

Btw, this is how ELKS + misbehaving Mandelbrot break MartyPC https://github.com/dbalsom/martypc/issues/116#issuecomment-2125503017

ghaerr commented 4 months ago

@Vutshi - I tried the ta2as conversion program above, and once I got it to compile on macOS (strlwr is missing) it seemed to convert your mandelbrot source semi-kind-of-well, for the instructions it will convert. The major problem(s) you might run into is that NASM is using macros and those aren't converted or aren't available.

FYI here's the patch I used to get ta2as to compile in case you're interested:

diff --git a/make.sh b/make.sh
old mode 100644
new mode 100755
diff --git a/src/main.c b/src/main.c
index 7410ca2..30b91d8 100644
--- a/src/main.c
+++ b/src/main.c
@@ -83,3 +83,17 @@ int main(int argc, char *argv[])
        fclose(out);
        return 0;
 }
+
+#include <ctype.h>
+
+char *strlwr(char *str)
+{
+  unsigned char *p = (unsigned char *)str;
+
+  while (*p) {
+     *p = tolower((unsigned char)*p);
+      p++;
+  }
+
+  return str;
+}
diff --git a/src/ta2as.c b/src/ta2as.c
index 9f78fd5..6dfe20f 100644
--- a/src/ta2as.c
+++ b/src/ta2as.c
@@ -18,6 +18,8 @@
 #include <string.h>
 #include <ctype.h>

+extern char *strlwr(char *str);
+
 typedef void (*modfunc)(AsmLine *ln);

 typedef struct {
Vutshi commented 4 months ago

Thank you @ghaerr.

First I try to employ the newly available GPT-4o. When it doesn't work I will switch to more conventional tools :)

Vutshi commented 4 months ago

ChatGPT doesn't want to just do a plain conversion it tries to optimize :)

shr di,1
shr di,1

converted to:

shrw $2, %di
Vutshi commented 4 months ago

Well, it kind of works...

https://github.com/ghaerr/elks/assets/4971779/e2173ff1-99c6-47cc-8f5b-a73c87353fd2

here is the GAS code mandel_elks.s.zip

ghaerr commented 4 months ago

What did you use for conversion, ta2as? I'm guessing GPT didn't do it for you...

Something to consider during Intel to AT&T conversion: input NASM syntax like

   mov ax,foo

could mean

  mov ax,[foo] ; normal access of memory contents

or instead the less likely

mov ax, foo ; foo is a constant and not a memory address

I think NASM does not require the [foo] syntax for the memory move is foo is a label, instead of an EQU.

The translator would not know the difference between them, while an assembler would. This might be an issue to check into.

Vutshi commented 4 months ago

What did you use for conversion, ta2as? I'm guessing GPT didn't do it for you...

Actually, I mainly used ChatGPT because it understands macros. Although it has a mind of its own and tends to reorganize functions ordering according to its liking. I cleaned up the code to match the objdump of the NASM-generated object file. Now, I have a one-to-one matching of NASM and GAS codes except for the .data section.

Essentially elf2elks forces me to introduce non empty .data otherwise it complains:

elf2elks: error: data and BSS sections overlap!

so instead of this in NASM:

section .data
absolute 0
iters:
  resb itersX*itersY
  alignb 2
aTable:
  resw itersX
bTable:
  resw itersY
yTableLower:
  resw 102 ;itersY
yTableUpper:
  resw 102 ;itersY
itersXTable:
  resw itersY
colourTable:
  resb 35
squareTableSegment:
  resw 1
video_mode:
  resb 1
stackStart:
  resb 128

I do this in GAS:

.data
.byte 0x01, 0x02

.bss
.local iters, aTable, bTable, yTableLower, yTableUpper, itersXTable, colourTable, squareTableSegment, video_mode, stackStart
.comm iters, itersX*itersY, 2
.comm aTable, itersX * 2, 2
.comm bTable, itersY * 2, 2
.comm yTableLower, 102 * 2, 2
.comm yTableUpper, 102 * 2, 2
.comm itersXTable, itersY * 2, 2
.comm colourTable, 35, 1
.comm squareTableSegment, 2, 1
.comm video_mode, 1, 1
.comm stackStart, 128, 1

and compile as follows:

./ia16-elf-gcc -melks-libc -mcmodel=small -nostdlib mandel_elks.s -o mand_els -maout-heap=0xffff

It works but the .data section reserves 16 bytes and everything is shifted because program assumes that .bss is at DS:0. How to get rid of the .data section completely?

ghaerr commented 4 months ago

@Vutshi: You are indeed making great progress traveling fairly deep into an interesting linking rabbit hole :)

How to get rid of the .data section completely?

I'm thinking of two possibilities to consider: 1) remove the check for .data and .bss section overlap in elf2elks, and allow the link to proceed with a null .data segment, which should cause no problem, or 2) modify the default elks/elks-small.ld linker script to remove the requirement for the .data section.

The first option will be easiest at first, and if it works I could come up with allowing an overlap iff the size of .data == 0. It appears that elf2elks has never worked with an ELF conversion (from C) that doesn't have null .data segment. This is likely because the startup C crt0.S is always included and sets aside DS:0 to protect NULL data pointer dereferences:

    .data

// Zero data for null pointers (near & far)
// Will be linked as first section in data segment

    .section .nildata

    .word 0
    .word 0

(Section .nildata is linked first in the linker script into the data segment).

To try the elf2elks mod, edit elks/tools/elf2elks/elf2elks.c as follows:

  check_scn_overlap (text_sh, "text", ftext_sh, "far text");
  check_scn_overlap (text_sh, "text", data_sh, "data");
  check_scn_overlap (text_sh, "text", bss_sh, "BSS");
  check_scn_overlap (ftext_sh, "far text", data_sh, "data");
  check_scn_overlap (ftext_sh, "far text", bss_sh, "BSS");
  check_scn_overlap (data_sh, "data", bss_sh, "BSS"); // <--- comment out this line

Run make in the ELKS root and a new elf2elks will be created.

The possibly better way might be to modify the elks-small.ld linker script, but they are a bit of black magic. Here's the portion that may need to be renamed and/or deleted:

    .data 0x30000 : { // <--- rename this section to .olddata
        /* IA-16 segment start markers. */
        *(".nildata!*" ".nildata.*!")
        *(".rodata!*" ".rodata.*!")
...
    .bss : { // <--- change this to .bss 0x30000
        *(.bss .bss$* ".bss.*[^&]")
        *(COMMON)
...
// in the asserts that follow, change .data to .bss
        ASSERT (. + 0x100 - ADDR (.data) <= 0xfff0
            "Error: too large for a small-model ELKS a.out file.");
        /* Sanity check any -maout-total= and -maout-chmem= values */
        PROVIDE (_total = 0);
        PROVIDE (_chmem = 0);
        ASSERT (_total <= 0xfff0
            && . - ADDR (.data) + _chmem <= 0xfff0,
            "Error: total data segment size too large.");
        ASSERT ((_total == 0 || _total > . - ADDR (.data))
            && _chmem >= 0,
            "Error: total data segment size too small.");
    }

The changed linker script can then be passed to the linker as ia16-elf-gcc -T <linker_script> ... in your link command.

I mainly used ChatGPT because it understands macros. Although it has a mind of its own and tends to reorganize functions ordering according to its liking

Wow, impressive!!

Now, I have a one-to-one matching of NASM and GAS codes except for the .data section. It works but the .data section reserves 16 bytes and everything is shifted because program assumes that .bss is at DS:0.

I see, very nice. It sounds like you're getting quite close to having this fully worked out!

Vutshi commented 4 months ago

It's done.

https://github.com/ghaerr/elks/assets/4971779/4b86637f-e921-4240-86fb-cd6a5ed24d62

The bug with stripes was due to SP reset to 0x1c00. One should not touch SP in a multitasking OS indeed. I just use cmpw $0x1c00, %ax which costs 19 bytes more than cmpw %sp, %ax.

@FrenkelS thanks for the trick, now I use it as well ;)

There is a couple of weird quirks left. 1) I tried to jump to exit if far memory is not allocated successfully:

  38:   cd 80                   int    $0x80
  3a:   85 c0                   test   %ax,%ax
  3c:   0f 85 fd 00             jne    13d <exit>

But the CPU (or MartyPC) sees it differently:

weird_jump

2) GAS doesn't like this code:

.equ MULTIPLIER, 0x600
.equ maxX, 320
.equ maxY, 101
.equ initialShift, 5
.equ initialGrid, (1 << initialShift)
.equ    itersX, ((maxX + initialGrid - 1)/initialGrid)*initialGrid + 1
.equ    itersY, ((maxY + initialGrid - 1)/initialGrid)*initialGrid + 1

it complains about the last two lines as follows Error: found ' ', expected: ')

Best code&binary_v2.zip

Vutshi commented 4 months ago

@ghaerr, I forgot to say that I used the first option to fix .data section problem

\\check_scn_overlap (data_sh, "data", bss_sh, "BSS"); // <--- comment out this line
ghaerr commented 4 months ago

This is great, well done!!

The bug with stripes was due to SP reset to 0x1c00. One should not touch SP in a multitasking OS indeed. I just use cmpw $0x1c00, %ax

I see, was this previously using SP to determine the calculation depth of the fractal?

I haven't studied your source code yet, does this final version leave SS & SP alone, or it SP still being reset to within the lower part of the data segment?

@FrenkelS thanks for the trick, now I use it as well ;)

What trick is that, I didn't see it.

0f 85 fd 00 jne 13d But the CPU (or MartyPC) sees it differently:

In the 8086, 0x0F is an invalid opcode that in early versions of the 8086 was a "POP CS". Since that instruction, changing the code segment, but not the IP at the same time, is useless, the same instruction prefix was used in later CPUs as the beginning of a new multi-byte opcode. To fix this, either use jne short label or add ..arch i8086, nojumps to your .S file. The opcode that is being generated in your case is a "long conditional" jump, which won't actually run on real 8086 hardware.

I used the first option to fix .data section problem

I'll make a note to enhance elf2elks to allow . bss without data when .data size is 0.

it complains about the last two lines as follows Error: found ' ', expected: ')

I'm not sure what that is, perhaps rearrange the lines or add a few blank lines or spaces in each line to see whether that changes the error message.

Vutshi commented 4 months ago

I see, was this previously using SP to determine the calculation depth of the fractal?

Yes. The reason is probably to just save one byte. At least, I don't see any speed decrease.

does this final version leave SS & SP alone

Yes.

@FrenkelS thanks for the trick, now I use it as well ;)

What trick is that, I didn't see it.

The black and white Mandelbrot turns green in the end.

Thank you.

ghaerr commented 4 months ago
.equ MULTIPLIER, 0x600
.equ maxX, 320
.equ maxY, 101
.equ initialShift, 5
.equ initialGrid, (1 << initialShift)
.equ    itersX, ((maxX + initialGrid - 1)/initialGrid)*initialGrid + 1
.equ    itersY, ((maxY + initialGrid - 1)/initialGrid)*initialGrid + 1
#.equ tempX, (maxX + initialGrid - 1)//initialGrid
#.equ itersX, (tempX*initialGrid + 1)
#.equ tempY, (maxY + initialGrid - 1)//initialGrid
#.equ itersY, (tempY*initialGrid + 1)
#.equ itersX, 321
#.equ itersY, 129

Were you saying that the top 7 lines work, and the lower 7 do not, with GAS? Certainly GAS will not like the C++ // comments apparently used for division for NASM?

Vutshi commented 4 months ago

Were you saying that the top 7 lines work, and the lower 7 do not, with GAS? Certainly GAS will not like the C++ // comments apparently used for division for NASM?

No, here lines 6 and 7 are the problem for GAS. NASM likes them. Commented lines are my attempts to fix it with GAS.

Vutshi commented 4 months ago

Hi @ghaerr, I broke something:

./ia16-elf-gcc -melks-libc -mcmodel=small -nostdlib mandel_elks.s -o mand_els88
elf2elks: error: ia16-elf-gcc: internal compiler error: Segmentation fault: 11 (program elf2elks)
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.

It happens after addition of the following line:

.data
.comm evil, 0x8000, 1

mandel_elks 2.s.zip

Best

ghaerr commented 4 months ago

I broke something

I would guess that by adding 32K bytes to .bss with the .comm directive you've overflowed .bss and / or the data section but the modified elf2elks isn't bounds/range checking anymore and an internal copy crashed it. A finally version of elf2elks will need to do more than just turn off overlap checking when sizeof .data == 0.

toncho11 commented 3 months ago

@ghaerr is advancing the watcom support for ELKS in https://github.com/ghaerr/elks/pull/1924 and so this might allow this stripped down version of Doom: https://github.com/FrenkelS/doomtd3 to be compiled on ELKS.

I see the watcom makefile for Doom: https://github.com/FrenkelS/doomtd3/blob/main/makefile.w16 The memory model is medium "mm" and set to 286 instructions I think, but can be set to 0 which means 8088. I think the assembler is taken care by @FrenkelS in this version.

tyama501 commented 3 months ago

It seems that at least i_ibm.c need to be modified for pc-98.

FrenkelS commented 3 months ago

I see the watcom makefile for Doom: https://github.com/FrenkelS/doomtd3/blob/main/makefile.w16 The memory model is medium "mm" and set to 286 instructions I think, but can be set to 0 which means 8088. I think the assembler is taken care by @FrenkelS in this version.

The Watcom build only uses C, no assembly. I didn't want to spend time on figuring out how to support assembly in both gcc-ia16 and Watcom at the same time. For djdoom I did figure out how to support multiple compilers and assembly. So I guess something similar could be done for doomtd3.

It seems that at least i_ibm.c need to be modified for pc-98.

That's correct. The platform specific code is mostly in one file (i_amiga.c, i_hp95lx.c, i_ibm.c, i_mac.c).

ghaerr commented 3 months ago

so this might allow this stripped down version of Doom: https://github.com/FrenkelS/doomtd3 to be compiled on ELKS.

The Watcom build only uses C, no assembly.

Using the Watcom C-only build (but using ewcc and ewlink) should allow compilation of the required source files with very little to no modification, with the exceptions noted below. The resultant OS/2-format binary image should then load and run when #1924 is completed.

It seems that at least i_ibm.c need to be modified for pc-98.

For ELKS, all DOS-oriented routines (timer, keyboard, interrupt handlers, OS calls, but not EGA graphics) will need to be replaced by Doom source files from Linux Doom, which should then also work for both IBM PC and PC-98. After that, an alternative screen-drawing routine for PC-98 will be required, as PC-98 does not support EGA hardware graphics.