Solo5 / solo5

A sandboxed execution environment for unikernels
ISC License
897 stars 138 forks source link

fix linker scripts with thread section #542

Closed palainp closed 1 year ago

palainp commented 1 year ago

Hi, this PR should fix the issue of missing thread data section in the linking step (see https://github.com/mirage/ocaml-solo5/pull/124#issuecomment-1380151218). It works (compiles + links) on a 12.2 FreeBSD as well as in my linux fedora AppVM.

I haven't added yet other than spt+hvt, not sure what is needed (maybe xen too to compiles for Xen/Qubes with llvm) and this PR should trigger a CI test to make sure that I haven't broken something.

As a comment it just add a PT_TLS section in the ELF binary and the linker is able to merge every TLS bits. The tbss part is also now after bss to respect the order of the data and tdata sections but that should not be mandatory.

palainp commented 1 year ago

Still TBD: protect the TLS section and update https://github.com/Solo5/solo5/blob/85ce3239ce87a6e49e550a7a0756b072a8c9fa6c/bindings/crt_init.h#L52 to point to that section (for our single code solo5).

palainp commented 1 year ago

All scripts needed to be modified for the test. Not sure why the CI fails with OpenBSD, maybe the clang version is different?

hannesm commented 1 year ago

Thanks for your work and PR. Why has the tbss been moved after bss?

palainp commented 1 year ago

There is no fundamental reason, except keeping the same order as for the data and tdata section. I guess it's ok to move it back before bss.

hannesm commented 1 year ago

I'm not saying we should not do that, I was just curious whether there was an underlying technical reason. I agree that having the same order as data / tdata makes sense. :)

dinosaure commented 1 year ago

It seems that this PR fixes our initial issue on ocaml-solo5 about FreeBSD but we still the OpenBSD support. May be @adamsteen has some ideas about the TLS support?

adamsteen commented 1 year ago

I will have a quick look when I can, life circumstances currently mean I don’t get much time to tinker, but I wouldn’t hold your breath, I am sorry!

On Sat, Jan 21, 2023 at 14:29, Calascibetta Romain @.***> wrote:

It seems that this PR fixes our initial issue on ocaml-solo5 about FreeBSD but we still the OpenBSD support. May be @.***(https://github.com/adamsteen) has some ideas about the TLS support?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

palainp commented 1 year ago

I tried to compile on OpenBSD 7.2, not sure if that's the same issue, as I can't see the CI logs, but on my laptop, when building an hvt unikernel, it complains about __emutls_get_address not available at the linking step.

A quick search shows that it should be exported on OpenBSD with compiler-rt (https://github.com/ziglang/zig/issues/5921#issuecomment-1275582979) but unsure that this is related to our case. Maybe WE need some specific arguments in the toolchain?

palainp commented 1 year ago

Happy, not happy, putting -fno-emulated-tls in TARGET_CC_CFLAGS in configure.sh to create the OpenBSD toolchain scripts seems to pass the test (I'll push a commit as soon as I'll get my laptop back).

Sadly I was unable to run an hvt hello unikernel (page fault in domain_create, exactly the same as in https://github.com/mirage/ocaml-solo5/pull/122#issuecomment-1356738491 , with a pin to the ocaml-solo5#500-cleaned branch, I'll soon try to gdb it to see what is the %fs register value there).

hannesm commented 1 year ago

Dear @palainp, thanks for this PR.

We need the following to actually execute the tls tests:

diff --git a/tests/tests.bats b/tests/tests.bats
index d911eb0..c542b7a 100644
--- a/tests/tests.bats
+++ b/tests/tests.bats
@@ -416,15 +416,11 @@ xen_expect_abort() {
 }

 @test "tls hvt" {
-  skip_unless_host_is Linux
-
   hvt_run test_tls/test_tls.hvt
   expect_success
 }

 @test "tls virtio" {
-  skip_unless_host_is Linux # XXX is this necessary for virtio?
-
   virtio_run test_tls/test_tls.virtio
   virtio_expect_success
 }

And, at least on my FreeBSD laptop, the test_tls fails:

 ✗ tls hvt
   (from function `expect_success' in file tests.bats, line 165,
    in test file tests.bats, line 420)
     `expect_success' failed
               |      ___|
     __|  _ \  |  _ \ __ \
   \__ \ (   | | (   |  ) |
   ____/\___/ _|\___/____/
   Solo5: Bindings version v0.7.5-7-g69a894b
   Solo5: Memory map: 2 MB addressable:
   Solo5:   reserved @ (0x0 - 0xfffff)
   Solo5:       text @ (0x100000 - 0x104fff)
   Solo5:     rodata @ (0x105000 - 0x105fff)
   Solo5:       data @ (0x106000 - 0x10afff)
   Solo5:       heap >= 0x10b000 < stack < 0x200000

   **** Solo5 standalone test_tls ****

   Solo5: trap: type=#PF ec=0x0 rip=0x104c44 rsp=0x1fffc0 rflags=0x10046
   Solo5: trap: cr2=0x2
   Solo5: ABORT: cpu_x86_64.c:181: Fatal trap

According to objdump, looking at 4c44:

0000000000104c40 <get_data>:
  104c40:       55                      push   %rbp
  104c41:       48 89 e5                mov    %rsp,%rbp
  104c44:       64 48 8b 04 25 00 00    mov    %fs:0x0,%rax
  104c4b:       00 00 
  104c4d:       48 8b 80 f0 ff ff ff    mov    -0x10(%rax),%rax
  104c54:       5d                      pop    %rbp
  104c55:       c3                      ret    
  104c56:       66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
  104c5d:       00 00 00 

So, it looks like we need some setup to avoid the page fault... Does this make sense? Does anyone know what needs to be done?

palainp commented 1 year ago

Oups you're totally right ! :ashamed:

I don't understand why, but the set_data(2) call (https://github.com/Solo5/solo5/blob/8c3a744f998b9977cfc6cd29e0cc40ae2efba167/tests/test_tls/test_tls.c#L109) somehow overwrites tcb1.tp and leaves us with the wrong TLS address after that. There is something to investigate here as it seems we only have one __thread uint64_t variable and the struct size should be sufficient.

And investigating this point leads me to a possible bug in ocaml-solo5, we currently point TLS to the first address in the TLS zone, and it should point to the last address, this is consistent with the offset I can see in the assembly being negative, am I right?

palainp commented 1 year ago

So the issue was due to an alignment difference for some targets (for example, the data section was a bit larger with hvt than with spt target, and the tbss section need to be aligned differently). It's fixed now by tellin lld that tbss is also a TLS related section with a correct test_tls.c code. The usage for those TLS related things should be as in the test file:

[1]: in order to be able to do that, we probably need to add some symbol in the linker scripts

palainp commented 1 year ago

And if we want to have multiple threads (solo5 multicore or for the main thread in ocaml-solo5), we need to to that for each one. It seems easier to use dynamic allocation for the allocating part of the procedure as the size of the TLS can be computed only in the link pass.

hannesm commented 1 year ago

Thanks again @palainp. I have one small question, though -- do you understand why with the earlier commit (where there was no tbss PT_TLS in any linker script) there was a difference in behaviour between spt and hvt?

hannesm commented 1 year ago

I have tested this on my FreeBSD 13.1 laptop successfully, thus approving.

@dinosaure any chance you get around to merge this and #544 and cut a release? This will help us to unblock the ocaml-solo5 OCaml 5 PR.

dinosaure commented 1 year ago

Yes, I will take a look in them deeply today and unlock the release process.

dinosaure commented 1 year ago

I found this description about PT_TLS which corresponds to the alignement story explained by @palainp:

TLS variables are much like any other global/static variable. In implementation their initial data winds up in the PT_TLS segment. The PT_TLS segment is inside of a read only PT_LOAD segment despite TLS variables being writable. This segment is then copied into the process for each thread in a unique writable location. The location the PT_TLS segment is copied to is influenced by the segment's alignment to ensure that the alignment of TLS variables is respected.