Closed pavelvkozlov closed 1 year ago
The ul_jrand
variable is accessed using Local Dynamic TLS Model. The Local Dynamic is an optimization of the General Dynamic TLS model. The idea behind local dynamic is to use an anchor and all the local TLS variables to be addressed relative to this anchor.
Thus, for our variable in question, we have:
add r0,pcl,@.tdata@tlsgd
bl @__tls_get_addr@plt
add r13,r0,@ul_jrand@dtpoff
st_s r2,[r13]
As we can see, we invoke __tls_get_addr
like in the case of general dynamic, instead of targeting a per-variable in tls_index
object, a special dummy object is targeted by the ARC_TLS_GD_GOT
relocation, this is indicated by the use of .tdata
as the relocation symbol. This special tls_index
object has the ti_offset
field set to 0
, only the ti_module
field is filled in by the run-time linker, as such this special tls_index
can be used to identify the module number, but not the variable offset. Thus, the local dynamic is generating only a single dynamic relocation namely ARC_TLS_DTPMOD
.
The variable offset is supplied later by the ARC_TLS_DTPOFF
relocation (i.e., @dtpoff
) , which can be added to the base address returned from __tls_get_addr
.
Now, from your description above, I don't really see this behavior happening at run time.
A note: ARC64 doesn't use the local dynamic tls mode.
@claziss, thanks for clarification, now I see why compiler puts this extra add instruction. But one additional question. Why linker also put offset (0xc) into the GOT?
arc-linux-gnu-readelf -x .got libbar.so
Hex dump of section '.got':
0x00003fe0 00000000 00000000 00000000 00000000 ................
0x00003ff0 00000000 00000000 0c000000 ............
Runtime uses this value as ti_offset
to count address for ul_jrand
in __tls_get_addr()
. In case of Local Dynamic optimizations, it should be zero, I guess.
Probably to signalize to the dynlinker to not consider the dynamic reloc allocated!? As far as I can see there is a dynreloc associated to the location:
3ff8: 0c 00 00 00 .word 0x0000000c
3ff8: R_ARC_TLS_DTPOFF *ABS*
@claziss, you are right, this issue can be fixed in the dynamic linker, and I had a try. But unfortunately, I faced another linker issue. It was hidden by current implementation of the R_ARC_TLS_DTPOFF
relocation resolve in dynlinker. The essence of the problem is in incorrect offset for a local TLS variable, if this variable is placed in the .tbss section. Linker counts offset for a TLS variable based on .tbss section start but should count based on TLS module start. As a result, linker puts incorrect offset into the code. But at the same timer linker puts correct offset as ti_offset
to GOT.
Small test that demonstrates this issue: tls2.zip
Please take a look at tls_register_e2
function assembler code. Linker fails to put correct offset for e2 TLS variable. It puts zero to add_s
instruction but it should be 0x40. In GOT you will find 0x40 and because of this value in GOT test passes with current glibc.
596: 2700 7f80 0000 3a60 add r0,pcl,0x3a60 ;3ff4 <.got+0x18>
59e: 0f4e ffef bl.d -180 ;4e8 <.plt+0x20>
...
5a6: 70c3 0000 0000 add_s r0,r0,0
I think in this situation we should leave dynlinker code as it is now and fix these two issues in linker. By fixing initial problem in dynlinker we just exchange one issue to another and to fix these two issues someone should use updated glibc/uclibc and updated linker.
I've created a branch with two fixes for linker. Please take a look. It's not a final version, I have to run regression tests, But I hope it will help to illustrate my point: https://github.com/foss-for-synopsys-dwc-arc-processors/binutils-gdb/commit/53abcc2521538286d6fa05028220fc0b884c1eb2 https://github.com/foss-for-synopsys-dwc-arc-processors/binutils-gdb/commit/06a1ae7c82d54b13dff42271066a2aa58e1a7d81
I suppose it is what it is expected:
add r0,pcl,@.tbss@tlsgd ;10
bl @__tls_get_addr@plt;1
...
add_s r0,r0,@e2@dtpoff
Please remark the input argument r0
has a different symbol. The idea behind all of this is to use R_ARC_TLS_DTPOFF_S9
relocation for size purposes, which it is still a desideratum.
Of course you can correct the offset in dynlinker (which probably was the initial design):
...
5aa: 2700 7f80 0000 3a40 add r0,pcl,0x3a40 ;3fe8 <.got+0x18>
5ae: R_ARC_TLS_GD_GOT .tbss
5b2: 0f4e ffef bl.d -180 ;4fc <.plt+0x20>
5b2: R_ARC_S25W_PCREL_PLT __tls_get_addr@GLIBC_2.32
5b6: a540 st_s r2,[r13,0]
5b8: da20 mov_s r2,0x20
5ba: 70c3 0000 0000 add_s r0,r0,0
5bc: R_ARC_TLS_DTPOFF e2
...
00003fd0 <.got>:
...
3fd0: R_ARC_GLOB_DAT _ITM_deregisterTMCloneTable@Base
3fd4: R_ARC_GLOB_DAT _ITM_registerTMCloneTable@Base
3fd8: R_ARC_GLOB_DAT __cxa_finalize@GLIBC_2.32
3fdc: R_ARC_GLOB_DAT tls_registry@Base
3fe0: R_ARC_TLS_DTPMOD *ABS*
3fe4: R_ARC_TLS_DTPOFF *ABS*
3fe8: R_ARC_TLS_DTPMOD *ABS*
3fec: 40 00 00 00 .word 0x00000040
3fec: R_ARC_TLS_DTPOFF *ABS*
The _tls_get_addr
function gets the 0x3fe8
as input parameter. The 0x3fec
got location has the offset which you may need to compensate with.
Unfortunately, we have two examples of mutually exclusive behavior for the runtime dynlinker. In the first example https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/issues/497#issue-1493800061 the ld
linker patches code with offset and also puts ti_offset
value into the GOT and to correctly handle the first situation the runtime dynlinker should rewrite the offset value in the GOT, put zero as ti_offset
value because of the Local Dynamic optimization. In this case common code in dynlinker like the following will be correct for the R_ARC_TLS_DTPOFF
relocation:
*reloc_addr = sym->st_value + reloc->r_addend;
In the second example https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/issues/497#issuecomment-1366736529 the ld
linker doesn't adds offset to the code, puts ti_offset
value into the GOT and expects that the runtime dynlinker will use this value in the GOT to handle R_ARC_TLS_DTPMOD
relocation. And in this case currently used code for the R_ARC_TLS_DTPOFF
relocation works correctly:
*reloc_addr += sym->st_value;
Maybe I missed something, but I don't see how the dynlinker can separate one situation from another in runtime.
@pavelvkozlov I have updated the compiler to NOT generate the Local Dynamic model using references to tdata
or tbss
. Please let me know how is it working.
Ref: https://github.com/foss-for-synopsys-dwc-arc-processors/gcc/commit/c255a88c37af6b0185e333bfa94e8f33fc99a8d2
@claziss, I can confirm with your change all reported problems have gone. And glibc tests show no regressions. I still think that it was the static linker responsibility to correctly handle references to tdata and tbss. This change in compiler fixes reported issue. Thanks.
This issue fiexd with https://github.com/foss-for-synopsys-dwc-arc-processors/gcc/commit/c255a88c37af6b0185e333bfa94e8f33fc99a8d2 It doesn't appear with 2022.09 and 2023.03 toolchains. Close for now. We can return to this issue if decide to return Local Dynamic model in the future.
Recently I've faced a problem with the mkswap utility (from linux util package) crashed by memory fault exception on ARCv2 Linux system.
My debug shows that write to a thread variable
ul_jrand_seed
in libuuid.so library damaged another thread variablethread_arena
belonged to glibc and located further in TLS. And then fault occurred in glibc. I found the following code in the libuuid.so library:In this code the
__tls_get_address
function is called to get address oful_jrand_seed
in TLS, the function gets the address in rela.dyn section with module index (0x1) and offset value (0x3c) already patched by dynamic linker (R_ARC_TLS_DTPMOD
andR_ARC_TLS_DTPOFF
relocations). And this function returns address already pointed to the`ul_jrand_seed
, but for some reason the compiler also adds extra 0x30 to the returned value and as a result next write damages memory belonged to another library.Based on this code I prepared a small test (archive in attach)
This test demonstrates the same issue. The patterns
0x0000abba
are preinitialized arrayul_jrand
of three unsigned int in TLS. During execution the program writes new values0x000094ba 0x00000e53 0x00000006
to this array in thecrank_random()
function but again extra offset 0xc has been added and program writes new values beyond the correct place with0x0000abba
patterns.I couldn't reproduce this problem on ARC32 and ARC64 systems, didn't see extra offsets in code. From my point of view something wrong with this extra offset in code. The linker and may be compiler are involved. In object file
libbar2.o
I see instructionadd r13,r0,0x0
, that then patched with 0xc value. I want to open a discussion to find the reason and fix this problem. tls.zip