Serentty / rusty-dos

A Rust skeleton for an MS-DOS program for IBM compatibles and the PC-98, including some PC-98-specific functionality
140 stars 9 forks source link

djgpp ? #3

Open stuaxo opened 4 years ago

stuaxo commented 4 years ago

djgpp has been packaged for Linux https://launchpad.net/~stsp-0/+archive/ubuntu/djgpp/+packages could that help, or is the aim just to have 16 bit code ?

I'm pretty fuzzy on how this works (though managed to compile the example and run in dosbox).

I read your post - https://www.reddit.com/r/rust/comments/ask2v5/dos_the_final_frontier/

As I'm new to rust, I'm pretty fuzzy on how this works, and where to extend things to play with this..

BlogOS has some VGA text mode routines, which looks like an interesting place to start playing with, though at the moment, not even sure how to build another file apart from dos.com

Serentty commented 4 years ago

I'm happy that people are still finding that post!

So, right now it's not actually 16-bit code, despite what the compiler flags make it look like. For context, when Intel extended x86 from 16-bit to 32-bit, they added two ways to make use of these new 32-bit instructions. You could either switch to the new “protected mode” where instructions were 32-bit by default, and you could use an escape code to make instructions 16-bit, or stay in the old “real mode” where instructions are 16-bit by default. The code that the compiler is generating here is for real mode, but it still includes 32-bit instructions. I actually had a horrible bug earlier where the 16-bit and 32-bit instructions were swapped because for some reason Intel decided to make the escape codes for “switch to 16-bit” and “switch to 32-bit” the same and context-dependent on the current mode, and I had forgotten that compiler flag.

There are some unfortunate caveats about the setup here, including that in order to prevent LLVM from crashing, I have to tell it to use 32-bit pointers even though only the low 16 bits can actually be used in real mode without using a segment selector. So pointers are essentially twice the size that they need to be.

DJGPP is different from what I've done here so far in that it switches the CPU to protected mode. This is nice because it means that you can use 32-bit pointers and get a flat(-ish) address space, which is what Rust expects. If you actually wanted to develop software for DOS, this would probably be a good idea. I just wanted to start with real mode because it's definitively DOS, as opposed to protected mode, which seeps into the time of Windows. Real mode is just more retro.

I don't see how I could use DJGPP to help with this since it's based around GCC, and unfortunately Rust does not have any sort of GCC-based backend available at the moment. That said, DJGPP is fine if you want to write in C.

If you're interested in learning how to extend this, prod the hardware, play with VGA text mode, and so on, then I would recommend joining the Rust community Discord server, and I can walk you through the MS-DOS API, the Rust programming language, inline assembly, and all sorts of fun stuff like that. Here's a link.

https://discord.gg/aVESxV8

Serentty commented 4 years ago

Wait a minute, you might be onto something with this whole DJGPP idea! I might be able to write some sort of stub in C using DJGPP that loads a module of 32-bit Rust code!

Serentty commented 4 years ago

@stuaxo Oh, by the way, when you get on the Discord server, my username there is “Kiong-luē Liân-huâ”.

stuaxo commented 4 years ago

Back when I was playing with DOS, my practical knowledge stopped at the different memory models of real mode (Turbo Pascal and Turbo C let you choose these in the IDE), protected mode always seemed out of reach, as there were scary looking bits of assembly you could download, including the (then new) flat real mode.

Can you write a bit about how com.ld and startup.s work ?

Is generating .exe rather than .com harder because they have a particular layout ?

I found this old guide on using nasm to create exes, not sure if it helps? http://www.fifi.org/doc/nasm/html/nasmdoc7.html

I'll try and jump on the discord server @ some point, though free time is a bit fragmented these days.

stuaxo commented 4 years ago

Logged onto discord long enough to work out that I'd set my username to 64kb.

Serentty commented 4 years ago

It's funny, because to me protected mode is “normal” (since it's what modern software all runs in) and real mode is the the thing to be learning.

The com.ld file is just a linker script meant to generate COM files, since although neither the Rust compiler nor LLVM has ever heard of that format, it's so ridiculously simple that this script is enough to describe it. Now, I needed some help to write this, but I'll do my best to describe it anyway. It gives everything an offset of 0x100. This is because that's where in memory MS-DOS loads COM files, so any addresses in the executable need to be offset by that amount. Then it just pastes in all of the different executable sections one after another, as the COM format has no such concept of sections; it just copies the entire file into a contiguous region in memory.

Luckily, startup.s is a lot simpler. All it does is look for a function named “start” to call, and then the next two lines ask MS-DOS to end the program.

Yes, the EXE format is harder becase it's a lot more complicated, and includes information about how to use multiple segments of memory. I've actually considered, instead of figuring out how to get the Rust compiler to generate EXEs, simply finding an ELF loader for MS-DOS. That might also have the benefit of letting me switch to protected mode.

Enet4 commented 4 years ago

I just stumbled upon this issue. Can't help but chime in.

I am also an MS-DOS enthusiast, and I made some attempts to target DOS via DJGPP some months ago. Alas, creating a target descriptor may not be enough, as the linker is likely expecting a different intermediate compilation outcome. I would have an .EXE file, but it would crash immediately once run due to a memory access violation (although depending on how it's compiled, I could also get a SIGFPE signal due to a division by zero). I wouldn't be surprised if it was related with wrong symbol names or something like that.

For posterity, this is roughly what I tried for the target triple, built then with xargo on a barebones no_std project. Perhaps someone else more familiar with the subject can continue building on top of this or provide any feedback on parts which are clearly incorrect. Still, it might be true that bootstrapping a C project that runs Rust modules might be more feasible.

{
  "abi-return-struct-as-int": true,
  "allows-weak-linkage": false,
  "arch": "x86",
  "cpu": "i686",
  "custom-unwind-resume": true,
  "data-layout": "e-m:x-p:32:32-i32:32-f64:32-n8:16:32-a:0:32-S128",
  "dynamic-linking": false,
  "eliminate-frame-pointer": false,
  "emit-debug-gdb-scripts": false,
  "env": "djgpp",
  "exe-suffix": ".exe",
  "executables": true,
  "function-sections": false,
  "late-link-args": {
    "gcc": [
      "-Wl,--end-group"
    ]
  },
  "linker": "i686-pc-msdosdjgpp-gcc",
  "ar": "i686-pc-msdosdjgpp-ar",
  "linker-flavor": "gcc",
  "llvm-target": "i686-pc-windows-gnu",
  "position-independent-executables": false,
  "disable-redzone": true,
  "os": "msdos",
  "post-link-objects": [
  ],
  "pre-link-args": {
    "gcc": [
      "-m32",
      "-march=i686",
      "-fno-pie",
      "-fno-use-linker-plugin",
      "-nostdlib",
      "-Wl,--as-needed",
      "-Wl,--gc-sections",
      "-Wl,--start-group"
    ]
  },
  "pre-link-objects-exe": [
    "/usr/i686-pc-msdosdjgpp/lib/crt0.o",
    "/usr/i686-pc-msdosdjgpp/lib/libc.a"
  ],
  "requires-uwtable": true,
  "staticlib-prefix": "",
  "staticlib-suffix": ".a",
  "target-c-int-width": "32",
  "target-endian": "little",
  "target-family": "windows",
  "target-pointer-width": "32",
  "vendor": "pc"
}
#![feature(start, lang_items)]
#![no_std]
#![no_main]
use core::panic::PanicInfo;
use libc::{c_char, c_int};

extern "C" {
    fn exit(c: c_int);
}

#[start]
#[no_mangle]
pub extern "C" fn main(_argc: isize, _argv: *const *const c_char) -> isize {
    0
}

#[panic_handler]
fn handle_panic(_info: &PanicInfo) -> ! {
    // exit using libc
    unsafe {
        exit(-1);
        core::hint::unreachable_unchecked()
    }
}
Serentty commented 4 years ago

@Enet4 Thanks for this! I think I'll probably want to come back to this soon, so this is something I'll consider as well.

fschulze commented 4 years ago

I looked into this and also couldn't get it to work yet. But I found some things which might help:

The i686-pc-windows-gnu in the above comment seems to be on the right track. I used the releases from https://github.com/andrewwutw/build-djgpp for my tests. A helpful tool is using i686-pc-msdosdjgpp-objdump -d [exefile]. With that one gets the disassembly of the actual protected mode code. One can see the start symbol at the beginning and the crt1_startup and at the end the important _main. What I noticed is that the Rust generated main does a call somewhere and the C one doesn't.

#include <stdio.h>

int main(int argc, char **argv) {
    return 0;
}

results in:

00001f10 <_main>:
    1f10:   55                      push   %ebp
    1f11:   89 e5                   mov    %esp,%ebp
    1f13:   b8 00 00 00 00          mov    $0x0,%eax
    1f18:   5d                      pop    %ebp
    1f19:   c3                      ret    
    1f1a:   90                      nop
    1f1b:   90                      nop
    1f1c:   90                      nop
    1f1d:   90                      nop
    1f1e:   90                      nop
    1f1f:   90                      nop

where this:

#![feature(start)]
#![no_main]
#![no_std]
use core::panic::PanicInfo;

#[start]
#[no_mangle]
pub extern "C" fn main() -> i32 {
    0
}

#[panic_handler]
fn handle_panic(_info: &PanicInfo) -> ! {
    loop {}
}

results in:

00001df0 <_main>:
    1df0:   55                      push   %ebp
    1df1:   89 e5                   mov    %esp,%ebp
    1df3:   83 ec 08                sub    $0x8,%esp
    1df6:   8b 45 0c                mov    0xc(%ebp),%eax
    1df9:   8b 4d 08                mov    0x8(%ebp),%ecx
    1dfc:   89 45 fc                mov    %eax,-0x4(%ebp)
    1dff:   89 4d f8                mov    %ecx,-0x8(%ebp)
    1e02:   e8 30 00 00 00          call   1e37 <___main+0x17>
    1e07:   31 c0                   xor    %eax,%eax
    1e09:   83 c4 08                add    $0x8,%esp
    1e0c:   5d                      pop    %ebp
    1e0d:   c3                      ret    
    1e0e:   90                      nop
    1e0f:   90                      nop

That call to ___main seems really weird, because it seems to call back into the function main was called from. My guess is, that the resulting SIGSEGV is due to recursion, but I'm not sure about that.

The sourcecode of djgpp from djlsr205.zip contains all the startup code which helps to follow along, see __crt1_startup in crt1.c.

I hope this helps someone to figure this out. I'm giving up for now.

stuaxo commented 4 years ago

They might help on the DJGPP mailing list, I'm fairly sure the devs respond there http://www.delorie.com/djgpp/mailing-lists/subscribe.html

I'd ask, but don't have quite enough x86 asm experience to know what to ask.

fschulze commented 4 years ago

I got further. It seems to be a linker problem. The actual rust object file looks like this:

00000000 <_main>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   e8 00 00 00 00          call   8 <_main+0x8>
   8:   b8 05 00 00 00          mov    $0x5,%eax
   d:   5d                      pop    %ebp
   e:   c3                      ret    
   f:   90                      nop

00000010 <_rust_begin_unwind>:
  10:   55                      push   %ebp
  11:   89 e5                   mov    %esp,%ebp
  13:   8b 45 08                mov    0x8(%ebp),%eax
  16:   eb fe                   jmp    16 <_rust_begin_unwind+0x6>

But the call 8 <_main+0x8> is changed to call 1e37 <___main+0x17> after linking, which causes the error when running. I created a small C wrapper:

#include <stdio.h>

extern int rustmain();

int main(int argc, char **argv) {
    return rustmain();
}

and called the main rust function rustmain. That way it seems to work. I didn't get much further yet. I don't know how to see the exit code in DOS to see if I can actually change it. I also tried printf, but the libc crate doesn't expose it (I guess because it thinks we are compiling for windows where it isn't exposed) and manually caused another error, but maybe I did something wrong:

use libc::{c_char, c_int};

extern "C" {
    pub fn printf(format: *const c_char, ...) -> c_int;
}

#[no_mangle]
pub extern "C" fn rustmain() -> i32 {
    unsafe {
        printf(b"Hello, World!\0".as_ptr() as *const i8);
    }
    0
}

To get it to compile to this point I had to add -lgcc -lc -lgcc to late-link-args like djgpp does when linking C.

fschulze commented 4 years ago

Hmm, the rust generated assembler doesn't seem to reference anything outside of itself:

00000000 <_rustmain>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 ec 0c                sub    $0xc,%esp
   6:   8d 05 00 00 00 00       lea    0x0,%eax
   c:   89 04 24                mov    %eax,(%esp)
   f:   c7 44 24 04 0e 00 00    movl   $0xe,0x4(%esp)
  16:   00 
  17:   e8 00 00 00 00          call   1c <_rustmain+0x1c>
  1c:   89 45 fc                mov    %eax,-0x4(%ebp)
  1f:   8b 45 fc                mov    -0x4(%ebp),%eax
  22:   89 04 24                mov    %eax,(%esp)
  25:   e8 00 00 00 00          call   2a <_rustmain+0x2a>
  2a:   31 c0                   xor    %eax,%eax
  2c:   83 c4 0c                add    $0xc,%esp
  2f:   5d                      pop    %ebp
  30:   c3                      ret    
  31:   90                      nop
  32:   90                      nop
  33:   90                      nop
  34:   90                      nop
  35:   90                      nop
  36:   90                      nop
  37:   90                      nop
  38:   90                      nop
  39:   90                      nop
  3a:   90                      nop
  3b:   90                      nop
  3c:   90                      nop
  3d:   90                      nop
  3e:   90                      nop
  3f:   90                      nop

00000040 <_rust_begin_unwind>:
  40:   55                      push   %ebp
  41:   89 e5                   mov    %esp,%ebp
  43:   8b 45 08                mov    0x8(%ebp),%eax
  46:   eb fe                   jmp    46 <_rust_begin_unwind+0x6>

But the linker seems to do something sensible:

0000c780 <_rustmain>:
    c780:   55                      push   %ebp
    c781:   89 e5                   mov    %esp,%ebp
    c783:   83 ec 0c                sub    $0xc,%esp
    c786:   8d 05 00 58 01 00       lea    0x15800,%eax
    c78c:   89 04 24                mov    %eax,(%esp)
    c78f:   c7 44 24 04 0e 00 00    movl   $0xe,0x4(%esp)
    c796:   00 
    c797:   e8 50 00 00 00          call   c7ec <__ZN4core5slice29_$LT$impl$u20$$u5b$T$u5d$$GT$6as_ptr17h42cd1679a299a9e9E+0x1c>
    c79c:   89 45 fc                mov    %eax,-0x4(%ebp)
    c79f:   8b 45 fc                mov    -0x4(%ebp),%eax
    c7a2:   89 04 24                mov    %eax,(%esp)
    c7a5:   e8 70 00 00 00          call   c81a <_printf+0x2a>
    c7aa:   31 c0                   xor    %eax,%eax
    c7ac:   83 c4 0c                add    $0xc,%esp
    c7af:   5d                      pop    %ebp
    c7b0:   c3                      ret    
    c7b1:   90                      nop
    c7b2:   90                      nop
    c7b3:   90                      nop
    c7b4:   90                      nop
    c7b5:   90                      nop
    c7b6:   90                      nop
    c7b7:   90                      nop
    c7b8:   90                      nop
    c7b9:   90                      nop
    c7ba:   90                      nop
    c7bb:   90                      nop
    c7bc:   90                      nop
    c7bd:   90                      nop
    c7be:   90                      nop
    c7bf:   90                      nop

I guess there is adjusting of pointers going on. I don't know enough about linkers unfortunately.

Serentty commented 4 years ago

@fschulze Wow, I really appreciate this! Is this 32-bit protected mode?

fschulze commented 4 years ago

@Serentty yes, it is 32-bit protected mode. The heavy lifting is all done by the DJGPP tools and without @Enet4's config as a base I wouldn't have known where to even begin.

Serentty commented 4 years ago

@fschulze I'm coming back to this now! I'll let you know if I need you to walk me through anything. This seems very promising.

Serentty commented 4 years ago

Hm... it seems that this is targeting the Pentium II (i686). I wonder if it would be possible to pass -march=i386 to DJGPP to get 386 support. I already know that the Rust compiler has no issues generating 386 code.

Serentty commented 4 years ago

I just checked and it does indeed support -march=i386, so I'll try to recreate what you two have thankfully taught me how to do above, but with modifications to target the 386 instead.

Serentty commented 4 years ago

Okay, so in my experience what seems to be happening is that when C code calls Rust it works fine, but when Rust code calls C it ends up calling the wrong address by some small offset. I've played around with the target specification a bit but didn't find anything that fixed it yet.

Serentty commented 4 years ago

As an example, if I call libc::exit(0) in Rust, the generated assembly looks like this:

    c733:   6a 00                   push   0x0
    c735:   e8 00 81 ff ff          call   483a <_exit+0xa>
Serentty commented 4 years ago

Well, this is one way to fix the issue:

let exit = (libc::exit as usize) - 0xA;
let exit: extern "C" fn(libc::c_int) -> ! = core::mem::transmute(exit);
exit(0);
fschulze commented 4 years ago

Either it is because of some kind of calling convention or there could be differences in the output of llvm versus what the gnu linker wants. Is the offset always the same, also for other functions? Have you been able to use printf or some other simpler function other than exit? If so, that info might be helpful when asking on the djgpp mailing list after all.

Serentty commented 4 years ago

Is the offset always the same, also for other functions?

I can do some tests to see.

Have you been able to use printf or some other simpler function other than exit?

I tried writing a hello world program like this (I also tried a version where I didn't offset the pointer to the string):

let puts = (libc::puts as usize) - 0xA;
let puts: extern "C" fn(*const libc::c_char) -> () = core::mem::transmute(puts);
puts(((b"Hello from Rust!\0".as_ptr() as usize) - 0xA) as *const libc::c_char);

It crashed not the program itself, not the OS, but all of DOSBox.

Serentty commented 4 years ago

It seems that the offset is not constant. Every time I call a function, it increases by 0xA.

0000c730 <_rust_main>:
    c730:    55                       push   ebp
    c731:    89 e5                    mov    ebp,esp
    c733:    6a 00                    push   0x0
    c735:    e8 c0 00 00 00           call   c7fa <_puts+0xa>
    c73a:    83 c4 04                 add    esp,0x4
    c73d:    6a 00                    push   0x0
    c73f:    e8 00 81 ff ff           call   4844 <_exit+0x14>
    c744:    83 c4 04                 add    esp,0x4
    c747:    0f 0b                    ud2    
fschulze commented 4 years ago

did you notice that the offset corresponds to the bytes used for the instructions?

fschulze commented 4 years ago

If you used i686-pc-msdosdjgpp-objdump on the object file, the offsets won't be finalized. I think you have to look at the final exe for that.

Serentty commented 4 years ago

This is on the final EXE. I see you're right though. Ten bytes pass in my code, and it's ten bytes more offset.

Serentty commented 4 years ago

It seems to me like one side is trying to generate addresses which are relative.

Serentty commented 4 years ago

This is interesting.

#[no_mangle]
pub unsafe extern "C" fn rust_main() -> libc::c_int {
    libc::puts as i32
}

This returns the correct address for puts(). So it only seems to give problems when I actually try to call it. Maybe it really is a calling convention issue.

Serentty commented 4 years ago

It even ends up calling a different address each time when I do this. Maybe LLVM is smart enough to realize that the usize is really a function pointer.

    let puts_usize = libc::puts as usize;
    let hello = b"Hello!\0".as_ptr() as *const libc::c_char;
    core::mem::transmute::<usize, extern "C" fn(*const libc::c_char) -> i32>(puts_usize)(hello);
    core::mem::transmute::<usize, extern "C" fn(*const libc::c_char) -> i32>(puts_usize)(hello);
Serentty commented 4 years ago

Okay, this just reached a whole other level of strangeness. I realized that even though the disassembler is showing it calling different addresses, if you look at the machine code bytes in question, they're identical. And when I paste the hex into other disassemblers, it shows it as calling address 0xB5.

    c76a:    e8 b0 00 00 00           call   c81f <_puts+0xf>
    c76f:    59                       pop    ecx
    c770:    56                       push   esi
    c771:    e8 b0 00 00 00           call   c826 <_puts+0x16>
Serentty commented 4 years ago

Aha! I looked up the opcode E8. It's a relative call. So it's trying to do PC-relative code after all.

Serentty commented 4 years ago

I keep trying to change the relocation model through various methods including compiler flags and the target specifcation. Neither dynamic-no-pic nor static seems to do anything. I have no idea how to get it to either stop generating these relative calls, or to get the addresses right. At this point I just need to go to bed.

fschulze commented 4 years ago

I searched a bit and this might help: https://github.com/rust-lang/rust/issues/36710#issuecomment-570813216, there are other things in the issue that might be helpful. Also we might want to specify the externals a bit differently, see https://doc.rust-lang.org/nomicon/ffi.html Here is some more info on linking in rust: https://doc.rust-lang.org/1.14.0/book/advanced-linking.html

Serentty commented 4 years ago

Can DJGPP even do dynamic linking? Anyway, I tried that flag and it didn't seem to do anything either. I'll look at that external stuff next.

Serentty commented 4 years ago

Something just occurred to me. We've been specifying the platform as Windows in the target specification. Couldn't it be that it's using a Windows calling convention instead of the Unix-style one that GCC is probably using?

jayschwa commented 4 years ago

Hm... it seems that this is targeting the Pentium II (i686). I wonder if it would be possible to pass -march=i386 to DJGPP to get 386 support. I already know that the Rust compiler has no issues generating 386 code.

FYI, there are DJGPP builds for Debian/Ubuntu that default to i386. https://launchpad.net/~jwt27/+archive/ubuntu/djgpp-toolchain

Serentty commented 4 years ago

Hm... I'm still not convinced it's the calling convention, because I don't think calculating relative offsets should be part of that.

Serentty commented 4 years ago

If I pass a function pointer from C to Rust, I'm able to call functions that don't take pointers, such as putchar() just fine. So I really don't think it's the calling convention now, or if it is, it's the smaller of multiple problems.

Serentty commented 4 years ago

I posted a question about this on the DJGPP mailing list.

https://groups.google.com/d/msg/comp.os.msdos.djgpp/0l6wjO-oSM0/wucHtHpCAgAJ

fschulze commented 4 years ago

Using some keywords from your mail, I found this in the llvm source, maybe its a lead: http://llvm.org/doxygen/RuntimeDyldCOFFI386_8h_source.html#l00131

Enet4 commented 4 years ago

Not sure if this helps in any way, but maybe it's worth checking whether the relocation resolution matches the COFF specification as presented in the official DJGPP website. http://www.delorie.com/djgpp/doc/coff/

Serentty commented 4 years ago

Unfortunately I'm not sure which of those relocation types is being used, as none of them match the names that Rust gives.

stuaxo commented 4 years ago

Which names does rust give ?

Serentty commented 4 years ago

It has stuff like “static”, “dynamic”, and “dynamic-no-pic”.

Serentty commented 4 years ago

If I set the code model to large, the issue goes away. However, now the compiler doesn't generate relative jumps at all, and instead loads the absolute address into a register and calls the register. So function calls are now more instructions and also introduce register pressure. Still, it's better than not working.

Enet4 commented 4 years ago

That's interesting. I just tried that out with this:

// imports, root attributes, panic handler, and other declarations omitted
#[start]
#[no_mangle]
pub extern "C" fn main(_argc: isize, _argv: *const *const c_char) -> isize {
    unsafe {
        puts(b"Rust says hello DOS!\0".as_ptr() as *const c_char);
    }
    0
}
RUSTFLAGS='-C code-model=large' RUST_TARGET_PATH=`pwd` xargo build

This compiles and runs, but does not print anything when run, it just exits gracefully. Replacing puts with printf or cputs did not help either. A call to clrscr does appear to move the command prompt C:\> to the beginning of the screen.

I might do some extra sleuthing later.

Serentty commented 4 years ago

I haven't been able to get C's I/O functions to work. Passing pointers around seems like it sometimes doesn't work even now. I've been writing to the screen using the VGA buffer at CONVENTIONAL_BASE + 0xB8000, where `CONVENTIONAL_BASEis0xF0000000``.

Enet4 commented 4 years ago

Whelp, I don't have much to show for it this time. I can be sure that the program runs the declared main function, as performing thousands of volatile writes leads to delays in the program's execution, but the screen isn't updated to reflect the intended changes. In particular, this main function in C works just fine and prints the given text.

#include <stdio.h>

int main(int argc, char* argv[]) {
    puts("Hello DOS from C.");
    return 0;
}

I was almost about to say that the equivalent in Rust does nothing, but... this code:

#[start]
#[no_mangle]
pub extern "C" fn main(_argc: isize, _argv: *const *const c_char) -> c_int {
    unsafe {
        puts(b"Rust says hello DOS!\0".as_ptr() as *const c_char);
    }
    0
}

Is resulting in this output if I run the C program first. If I don't, it just prints a new line with no visible characters. puts always prints a new line regardless.

Screenshot_2020-04-25_01-15-04

The assembly shows that at some point the function _puts (at address 0xc830) is called by moving its address into eax.

0000c760 <_main>:
    c760:   55                      push   %ebp
    c761:   89 e5                   mov    %esp,%ebp
    c763:   83 ec 14                sub    $0x14,%esp
    c766:   8b 45 0c                mov    0xc(%ebp),%eax
    c769:   8b 4d 08                mov    0x8(%ebp),%ecx
    c76c:   ba 90 5a 00 00          mov    $0x5a90,%edx
    c771:   89 45 fc                mov    %eax,-0x4(%ebp)
    c774:   89 4d f8                mov    %ecx,-0x8(%ebp)
    c777:   ff d2                   call   *%edx
    c779:   89 e0                   mov    %esp,%eax
    c77b:   c7 40 04 15 00 00 00    movl   $0x15,0x4(%eax)
    c782:   c7 00 00 0a 01 00       movl   $0x10a00,(%eax)
    c788:   b8 10 c8 00 00          mov    $0xc810,%eax
    c78d:   ff d0                   call   *%eax
    c78f:   89 45 f4                mov    %eax,-0xc(%ebp)
    c792:   89 e0                   mov    %esp,%eax
    c794:   8b 4d f4                mov    -0xc(%ebp),%ecx
    c797:   89 08                   mov    %ecx,(%eax)
    c799:   b8 30 c8 00 00          mov    $0xc830,%eax
    c79e:   ff d0                   call   *%eax
    c7a0:   31 c0                   xor    %eax,%eax
    c7a2:   83 c4 14                add    $0x14,%esp
    c7a5:   5d                      pop    %ebp
    c7a6:   c3                      ret    

Minor note: I had a look at your reproducible example on the mailing list, and I noticed that the null terminator was missing in one of the string literals, although I don't believe that it could ever make a difference there.

stuaxo commented 4 years ago

Is it worth trying dosemu2?

There are a lot of logging options available, including including messages when outputting to video.

Serentty commented 4 years ago

Oh yeah, forgetting the null terminator would be an issue if you pass it to puts(). However, I'm pretty much entirely sure that puts() gets the wrong address entirely because it ends up printing entirely garbage instead of the correct string followed by garbage. In general, functions which take pointers seem to have issues unless they're inlined. This is strange because some simple debugging seemed to indicate to me that addresses were the same before and after being passed to another function, but there could be a mistake somewhere in my testing. It seems to me like changing the code model to large was only a workaround and not a solution for the memory offset issue, and that that issue is what is causing the problems with passing pointers between functions.