Closed alexcrichton closed 5 years ago
Another option here is to just say that the TLS data is aligned "as-if allocated by malloc" (since that's now we actually plan to allocate it). Then, we can discuss the alignment for malloc, which has something of the same problem -- people may store SIMD values in malloc'd memory too.
wasm SIMD doesn't trap on misaligned accesses, so it isn't strictly necessary to align it. That said, it may improve performance in some cases if we do.
That's true! This I think would be somewhat of a future-proofing tactic as well though. For example some future wasm proposal (although AFAIK none currently do) may trap on a misaligned operation, in which case requiring a stricter alignment may be more necessary.
I think there's also a case to be made that applications themselves sometimes want to over-align data for their own purposes (such as storing data in bits or something weird like that). In that sense no trap would happen but an application would misbehave if data were unaligned, and without being able to specify the alignment of thread local data they'd have to resort to a different strategy.
That's true. Even if nothing would break at the ISA level, the TLS should definitely respect the source language's alignment guarantees.
The linker is capable of computing the alignment for the .tdata
segment, so it should be possible to synthesize a global __tls_align
. And then we could require TLS to be allocated with memalign
instead of malloc
. I'll write an LLVM patch and see what happens.
LLVM patch at https://reviews.llvm.org/D65028, which is in llvm/llvm-project@5204f7611f4ad6549921f9fa757823e77f39ce32.
Should the rest of us have a chance to review this before you land it in LLVM?
@sunfishcode Sorry about that, we didn't intend to circumvent proper review. We'd be happy to continue this discussion and change our solution upstream if necessary.
Currently the description of thread local storage allows specifying the size of the TLS section to the runtime to allocate, but it doesn't specify the alignment with which to allocate it. While a 4 or 8-byte alignment is probably good enough for most programs, I know that in Rust we've historically run into issues where SIMD things got into LTO builds which required a bigger alignment (like 16 bytes or so).
Would it be possible for a global like
__tls_align
be synthesized next to__tls_size
to indicate to the runtime at what alignment it needs to be allocated at?cc @quantum5