UtopiaOS / UtopiaV3

Utopiav3 is an Operating System, featuring various concepts and ideas.
Other
1 stars 1 forks source link

[Binary format] Let's get Mach-O files loading! #5

Open DiegoMagdaleno opened 2 years ago

DiegoMagdaleno commented 2 years ago

About

Mach-O (Mach Object) is a replacement for the a.out format, originally used in the Mach operating system, and later it was picked up by Apple for use in XNU.

I really like the Mach-O format and some features it offers, like:

And while the question might be, why not implement those features into ELF?, well, at the end of the day, there is no reason not to try to implement said features in the ELF format, however Utopia is my little "Utopia" and I really like the Mach-O format, specially the way dynamic libraries can have a version embedded into themselves (I don't like the /lib/libwhatever.so.1.2.2).

So this issues tries to track down, what we need to get a Mach-O binary running natively in Utopia!

Definitions

Information

This issue does NOT try to make Utopia binary compatible with macOS, mainly because:

  1. There are no macOS apps I need
  2. I don't want to have to implement Foundation, AppKit and friends, for that purpose we have Airyx and Darling

Utopia wants to produce Mach-Os that target the Linux ABI.

Tasks and research

In ELF (unsure if on other formats) the way most operating systems handle dynamic linking and friends, is by having the kernel being capable of loading an static ELF binary (that it doesn't link any other libraries at runtime).

Now when the user executes a linked binary, lets call this one foo.

The operating system will look for the linker path, that is commonly declared at the ELF header, once this is done, the kernel will open said loader, lets say /Core/Binaries/linker the linker will then make sure of reallocating the symbols and everything the binary needs to run, and finally it will call the start function, which is really just main.

Unsure if this would be the case for Mach-O, because no matter what I try it seems like lld doesn't want to give me a binary that doesn't request /usr/lib/dyld which is macOS location of the dynamic linker, however, thanks to @mszoek I know that dyld is a special binary that contains the following header: MH_DYLINKER this seems to be done with a "special mode", more research is needed about this.

This is already possible at the Utopia kernel level, but on the userland we don't any loader yet, we should write our own loader and parser, the current structure includes dividing the codebase into two:

While I am happy to announce as of 30 Jan of 2022 we are able to build (but not link!) libSystem, we have a long way to go, before we are able to know if it really works

Right now all compilers make this assumption, Mach-O == macOS/iOS/Darwin, which isn't true anymore, there is Utopia too! (As if it was relevant) we do know some quirks that Mach-O has, like @ facekapow (not mentioning him, because I spammed him a lot on Discord already) Mach-O expects, well the linkers expect, stuff like: ___stack_chk_fail to be existent when targeting Mach-O, I am up to implementing those on libSystem, however some Darwin-specific behavior (macOS version for example) are things I want to drop.

Resources

Of course, we have plenty of resources, to name a few:

DiegoMagdaleno commented 2 years ago

As of 3 of Feb 2021, it is now possible to run a statically linked Mach-O binary, given we have the following:

  1. The binary has to have proper exit routines, else we get an error at the RIP, as if it doesn't exit, then the process might try to execute the next bit of memory, causing the kernel to kill our process with a permissions violation.

What this means is:

int main() {
    return 0;
}

Won't compile, as LLD will complain we are missing the _start symbols

So if we were to implement our own:

void start(void) __asm__("start"); // Workaround Mach-O prefixing its symbols with a _

int main() {
     return 0;
}

void start(void) {
    int ret;
    ret = main();
}

This will make the Linux kernel go nuts, because sure, we are doing the basic startup routine, we are calling main after all no?, well yeah, but how does the kernel know when to stop executing?, correct, it doesn't, so it tries to execute the new instruction, it looks something like this

|Owned memory| |Other block|

Since we don't exit, the next instruction for our program is on the other_block of course the Linux kernel doesn't like this!, so it kills us (How responsible).

So how do we fix this?, that's right we need to tell the kernel we finished, there is a very simple way to do this: Implement an exit function, which might look like:

_Noreturn void exit(int code)
{
    for (;;) {
    asm("mov %0, %%rax\n\t"
        "mov %1, %%rdi\n\t"
        "syscall\n\t"
    :
    :   "r" ((uint64) SYS_exit),
        "r" ((uint64) code)
    :   "%rax", "%rdi");

    }
}

The code above is in charge of first cleaning all the registers that contain return variables, after that, its job is to call our SYS_exit syscall, once that is done, we tell them what code we want to exit with.

This is how we are able to call exit(ret) where ret is the value of our main function returned.

Since we are properly exiting now, the kernel won't complain we are accessing memory that isn't hours, because suprise suprise, all memory we are accessing is indeed ours.

And this is how we are able to execute Mach-O binaries now.

A small note here!

A lot of the bizarre stuff that comes from porting Mach-O to Linux (without a translation layer) is that we must differentiate into what code is Mach-O or ELF specific, and what code is Darwin or Linux specific, for example:

We use Linux syscalls in our binary above, however, we do some workarounds for Mach-O quirks (the underscore is one of them), so we must really think, is this not working because of the format or is it not working because of the OS?

mszoek commented 2 years ago

Nice work!

DiegoMagdaleno commented 2 years ago

Thank you! It means a lot, coming from someone with an amazing project such as Airyx!

mszoek commented 2 years ago

Utopia is going to be equally amazing :) I love what you're doing with it. I'm still undecided whether airyxOS will adopt Mach-O as the default binary format. It makes some aspects easier but others harder.

DiegoMagdaleno commented 2 years ago

Well, Mach-O provides a lot of benefits (Specially to what you're doing, since you're aiming for compatibility), still, if anything in Utopia ends up being useful for Airyx (As in, some implementations or research) it would be great!, ELF like everything has a little bit of quirks, but so does Mach-O is kind of deciding what you want to compromise on.

And also, thank you a lot, you're truly one of the persons that inspired me to make my own OS, and I'm happy you like it :).

Hope one day I can develop my little hobby OS on an Airyx OS powered computer