Closed palainp closed 1 year ago
Oups it seems that something is wrong between my laptop and CI, I'll check that and update. Sorry :/
The virtio failure (probably the same for xen) will be a bit harder to fix.
The multiboot headers, in bindings/{virtio,xen}/boot.S
, defined as https://www.gnu.org/software/grub/manual/multiboot/multiboot.html#Header-address-fields, should explicit the end of the tdata segment (tdata have to be copied too).
I tried to fix that but I don't understand yet how is calculated the offset from %fs for both virtio and xen (it's very much not the same as for the other targets, this may be an error in the linker script again).
Update: I'm starting to understand how ld and lld work with TLS. So far it seems that ld uses the address in the elf binary, and lld uses the memsize addresses, so with a linker script like :
...
.tdata { *(.tdata) } :tdata
. = ALIGN(CONSTANT(MAXPAGESIZE));
_edata = .;
.tbss { *(.tbss) } :tbss
...
We have with readelf -lW
:
Elf file type is EXEC (Executable file)
Entry point 0x100020
There are 7 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
...
TLS 0x005018 0x0000000000104018 0x0000000000104018 0x000008 0x011068 R 0x20
...
FileSiz
is the actual size of the tdata section in the elf (1 8B value in tdata as tbss should be set to 0 at runtime), MemSiz
is the size of the sections in memory (not sure yet how this is calculated).
We have an offset :
I'll try to figure out if I can give an option to ld or lld to force similar behaviour regarding the offsets.
To my understanding, the big memsize in the readelf output was due to bss being merged by lld with the previous sections. So I added a bss section in the elf header which solve that part. Now I have something more comprehensible:
...
LOAD 0x001000 0x0000000000100000 0x0000000000100000 0x003f44 0x003f44 R E 0x1000 # <- this is text
LOAD 0x005000 0x0000000000104000 0x0000000000104000 0x000018 0x000018 RW 0x1000 # <- this is data
TLS 0x006000 0x0000000000105000 0x0000000000105000 0x001000 0x001000 R 0x8 # <- this is tdata
TLS 0x005018 0x0000000000106000 0x0000000000106000 0x000000 0x000008 0x8 # <- this is tbss
NULL 0x000000 0x0000000000106000 0x0000000000106000 0x000000 0x010080 0x20 # <- this is bss
...
It works on my linux laptop with ld.lld but still fails with {Free,Open}BSD CI. I'll try to re-install a BSD on a spare laptop to understand what is the failure with CI.
Before the last commits, I didn't set any value for what was (before this PR) the tp
field in struct tcb
(e.g. nothing like tcb1.tp = &tcb1.tp
). I just set the TCB address to a correct value with solo5_set_tls_base
, and it seems that linux is ok with that (having an address for its %fs register), but BSD does an indirection on that address and so has %fs set to 0 (which explains the page fault).
That different behavior seems peculiar to me, I'll search the internet for something about that.
Now I also use some function to have compiler type checking and hiding the TCB/TLS details outside solo5, but it force me to have some casts in tls.c
between void*
and uintptr_t
.
The last thing I'm worried about is the need for . = ALIGN(CONSTANT(MAXPAGESIZE));
in the .tdata section. I haven't found how to push the alignment out of the section yet, this is due to some tests in https://github.com/Solo5/solo5/blob/a63a755d710a1286a0d0eea1253762ba25e866b7/tenders/common/elf.c#L283 and later (I haven't kept notes on which tests failed exactly :/ ).
However the consequences are limited to a slightly oversized binary (1 page max) as well for the memcpy
call when copying .tdata into the TLS thread.
Thank you for this line of work. I reviewed the code, and think it is in good shape -- i.e. the minimal effort to get this up and running :)
To me, it looks like while this extends the solo5 API and codebase slightly, it adds support for thread local storage (which is required for OCaml 5). The changes to the linker scripts and C codebase look fine to me, I really appreciate that the test case (test_tls) has been reformulated to use the freshly developed API.
From an attacker point of view, I was curious about tls_init
-- so it receives a pointer and will memcpy
some memory starting from that pointer address. But I couldn't figure what a nicer API would be -- the caller can't provide a proof that starting at the pointer there will be sufficient memory (of tls_size
) available.
So, thanks again for this work. I'd be happy if this would be merged and released as part of the next solo5 release :)
Thank you very much for your comment. I've updated the indentation and the comment in solo5.h
.
Right now, the API update shouldn't break anything, but you're right, there is now a piece of code that tries to write somewhere without checking that the destination size is sufficient. To me the only way around to solve this potential issue seems to bring back memory allocation into solo5 to keep control over the malloc(solo5_tls_size())
call.
And if it is merged and released, it will be possible to update ocaml-solo5 to use the new API as in test_tls.c
I was also thinking of another possibility which might be to export the address and length of .tdata
plus an offset into the thread memory and delegate the memcpy
to the caller. As solo5 does not use the __thread
variable, this should be safe.
I am indeed inclined to merge this PR and do a release. The comment from @hannesm confirms a double code review and I'm a bit more confident about the next step. Regarding the security aspect, the idea of wrapping the area with READ_ONLY
segments is not possible?
From a discussion on the MirageOS meeting (which will be public on the MirageOS mailing list), we agreed that the PR is ready to be merge from different point of views. Thanks for your work @palainp, it paves the way for the support of OCaml 5.
This PR is an follow-up of the previous TLS PR #542 . I'm currently not sure if the exported API is clear enough or makes sense at all, I'd feel better if someone could comment that.
Now we can export TDATA as the address of the the thread variables' initialization values, LTDATA the length of the tdata section, and LTBSS the length of the tbss section. This way, we can allocate, for each thread, an area of memory (the size is calculated with
solo5_tls_size()
), set the tp pointer to its value withsolo5_tls_tp_offset(.)
, and copy the initial value of tdata into our new memorysolo5_tls_data_offset(.)
. As an example, we can do:I've come across 2 bugs, the first is that if tdata is not a
PT_LOAD
section, it is not loaded by the tender, so the initial values of tdata are not copied into memory, the second is thatmemcpy
can be inlined and cause errors in our TLS use case (I haven't investigated this much so far, I'm going to try it out with ocaml-solo5 to see if this could be a bigger issue).