foss-for-synopsys-dwc-arc-processors / toolchain

Repository containing releases of prebuilt GNU toolchains for DesignWare ARC Processors from Synopsys (available from "releases" link below).
http://www.synopsys.com/IP/ProcessorIP/ARCProcessors/Pages/default.aspx
GNU General Public License v3.0
91 stars 48 forks source link

ARCv2: Wrong offset for thread variable (in TLS) locally defined and used in a shared library #497

Closed pavelvkozlov closed 1 year ago

pavelvkozlov commented 1 year ago

Recently I've faced a problem with the mkswap utility (from linux util package) crashed by memory fault exception on ARCv2 Linux system.

# mkswap /dev/vda
potentially unexpected fatal signal 11.
Path: /sbin/mkswap
CPU: 0 PID: 13904 Comm: mkswap Not tainted 5.16.0 #2
Read access not allowed on page
  @off 0x71c72 in [/lib/libc.so.6]  VMA: 0x20060000 to 0x2016a000
ECR: 0x00060108 EFA: 0x8e363ccb ERET: 0x200d1c72
STAT: 0xc0081082 [IE U     ]   BTA: 0x200d1b64
 SP: 0x5fa033c4  FP: 0x00000000 BLK: 0x200b5fea
LPS: 0x200d5144 LPE: 0x200d5156 LPC: 0x00000000
r00: 0x00000134 r01: 0x40011874 r02: 0x8e363c81
r03: 0x00000040 r04: 0x1c9c04ec r05: 0x00000000
r06: 0x00000176 r07: 0x00000000 r08: 0x00000001
r09: 0x00006180 r10: 0x0000001d r11: 0x0069cd69
r12: 0x00071b64 r13: 0x00000025 r14: 0x2017654c
r15: 0x40011874 r16: 0x00000134 r17: 0x5fa03f60
r18: 0x00000000 r19: 0x00000000 r20: 0x0000fe00
r21: 0x00000000 r22: 0x2001de98 r23: 0x00000000
r24: 0xffffffff r25: 0x201764e0
Segmentation fault

My debug shows that write to a thread variable ul_jrand_seed in libuuid.so library damaged another thread variable thread_arena belonged to glibc and located further in TLS. And then fault occurred in glibc. I found the following code in the libuuid.so library:

20c0:  2700 7f80 0000 5f38       add     r0,pcl,0x5f38   ;7ff8 <_GLOBAL_OFFSET_TABLE_+0xd8>
20c8:  0a32 ff8f                 bl      -3536   ;12f8 <.plt+0x260>
20cc:  2000 0f8f 0000 0030       add     r15,r0,0x30
20d4:  1400 3100                 ldh     r0,[sp]
20d8:  7d07                      xor_s   r13,r13,r0
20da:  b7a0                      sth_s   r13,[r15,0]

In this code the __tls_get_address function is called to get address of ul_jrand_seed in TLS, the function gets the address in rela.dyn section with module index (0x1) and offset value (0x3c) already patched by dynamic linker (R_ARC_TLS_DTPMOD and R_ARC_TLS_DTPOFF relocations). And this function returns address already pointed to the `ul_jrand_seed, but for some reason the compiler also adds extra 0x30 to the returned value and as a result next write damages memory belonged to another library.

Based on this code I prepared a small test (archive in attach)

LD_LIBRARY_PATH=/tmp /tmp/tls
ul_jrand: 0x000094ba 0x00000e53 0x00000006
Array jrand addr: 0x2013c500
Array TCB addr: 0x2013c4e0
Dump TCB:
0x2013cbf0 0x00000000 0x00003333 0x00002222 0x00001111 0x0000abba 0x0000abba 0x0000abba
0x000094ba 0x00000e53 0x00000006 0x20107874 0x20107e74 0x20108774 0x00000000 0x00000000
0x00000000 0x00016008 0x00000000 0x20132620 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000

This test demonstrates the same issue. The patterns 0x0000abba are preinitialized array ul_jrand of three unsigned int in TLS. During execution the program writes new values 0x000094ba 0x00000e53 0x00000006 to this array in the crank_random() function but again extra offset 0xc has been added and program writes new values beyond the correct place with 0x0000abba patterns.

 874:  2700 7f80 0000 3780     add    r0,pcl,0x3780   ;3ff4 <.got+0x14>
 87c:  0e9e ffcf               bl     -356    ;718 <.plt+0x90>
 880:  1400 3101               ldh    r1,[sp]
 884:  2000 0f8d 0000 000c     add    r13,r0,0xc
 88c:  2107 03c0               xor    r0,r1,r15
 890:  a500                    st_s   r0,[r13,0]

I couldn't reproduce this problem on ARC32 and ARC64 systems, didn't see extra offsets in code. From my point of view something wrong with this extra offset in code. The linker and may be compiler are involved. In object file libbar2.o I see instruction add r13,r0,0x0, that then patched with 0xc value. I want to open a discussion to find the reason and fix this problem. tls.zip

claziss commented 1 year ago

The ul_jrand variable is accessed using Local Dynamic TLS Model. The Local Dynamic is an optimization of the General Dynamic TLS model. The idea behind local dynamic is to use an anchor and all the local TLS variables to be addressed relative to this anchor.

Thus, for our variable in question, we have:

        add     r0,pcl,@.tdata@tlsgd
        bl      @__tls_get_addr@plt
        add     r13,r0,@ul_jrand@dtpoff
        st_s    r2,[r13]

As we can see, we invoke __tls_get_addr like in the case of general dynamic, instead of targeting a per-variable in tls_index object, a special dummy object is targeted by the ARC_TLS_GD_GOT relocation, this is indicated by the use of .tdata as the relocation symbol. This special tls_index object has the ti_offset field set to 0, only the ti_module field is filled in by the run-time linker, as such this special tls_index can be used to identify the module number, but not the variable offset. Thus, the local dynamic is generating only a single dynamic relocation namely ARC_TLS_DTPMOD.

The variable offset is supplied later by the ARC_TLS_DTPOFF relocation (i.e., @dtpoff) , which can be added to the base address returned from __tls_get_addr.

Now, from your description above, I don't really see this behavior happening at run time.

A note: ARC64 doesn't use the local dynamic tls mode.

pavelvkozlov commented 1 year ago

@claziss, thanks for clarification, now I see why compiler puts this extra add instruction. But one additional question. Why linker also put offset (0xc) into the GOT?

arc-linux-gnu-readelf -x .got libbar.so

Hex dump of section '.got':
  0x00003fe0 00000000 00000000 00000000 00000000 ................
  0x00003ff0 00000000 00000000 0c000000          ............

Runtime uses this value as ti_offset to count address for ul_jrand in __tls_get_addr(). In case of Local Dynamic optimizations, it should be zero, I guess.

claziss commented 1 year ago

Probably to signalize to the dynlinker to not consider the dynamic reloc allocated!? As far as I can see there is a dynreloc associated to the location:

    3ff8:       0c 00 00 00                     .word   0x0000000c
                        3ff8: R_ARC_TLS_DTPOFF  *ABS*
pavelvkozlov commented 1 year ago

@claziss, you are right, this issue can be fixed in the dynamic linker, and I had a try. But unfortunately, I faced another linker issue. It was hidden by current implementation of the R_ARC_TLS_DTPOFF relocation resolve in dynlinker. The essence of the problem is in incorrect offset for a local TLS variable, if this variable is placed in the .tbss section. Linker counts offset for a TLS variable based on .tbss section start but should count based on TLS module start. As a result, linker puts incorrect offset into the code. But at the same timer linker puts correct offset as ti_offset to GOT. Small test that demonstrates this issue: tls2.zip Please take a look at tls_register_e2 function assembler code. Linker fails to put correct offset for e2 TLS variable. It puts zero to add_s instruction but it should be 0x40. In GOT you will find 0x40 and because of this value in GOT test passes with current glibc.

 596:   2700 7f80 0000 3a60     add     r0,pcl,0x3a60   ;3ff4 <.got+0x18>
 59e:   0f4e ffef               bl.d    -180    ;4e8 <.plt+0x20>
 ...
 5a6:   70c3 0000 0000          add_s   r0,r0,0

I think in this situation we should leave dynlinker code as it is now and fix these two issues in linker. By fixing initial problem in dynlinker we just exchange one issue to another and to fix these two issues someone should use updated glibc/uclibc and updated linker.

I've created a branch with two fixes for linker. Please take a look. It's not a final version, I have to run regression tests, But I hope it will help to illustrate my point: https://github.com/foss-for-synopsys-dwc-arc-processors/binutils-gdb/commit/53abcc2521538286d6fa05028220fc0b884c1eb2 https://github.com/foss-for-synopsys-dwc-arc-processors/binutils-gdb/commit/06a1ae7c82d54b13dff42271066a2aa58e1a7d81

claziss commented 1 year ago

I suppose it is what it is expected:

       add   r0,pcl,@.tbss@tlsgd             ;10
       bl    @__tls_get_addr@plt;1
       ...
       add_s r0,r0,@e2@dtpoff

Please remark the input argument r0 has a different symbol. The idea behind all of this is to use R_ARC_TLS_DTPOFF_S9 relocation for size purposes, which it is still a desideratum.

Of course you can correct the offset in dynlinker (which probably was the initial design):

...
 5aa:   2700 7f80 0000 3a40     add     r0,pcl,0x3a40   ;3fe8 <.got+0x18>
                        5ae: R_ARC_TLS_GD_GOT   .tbss
 5b2:   0f4e ffef               bl.d    -180    ;4fc <.plt+0x20>
                        5b2: R_ARC_S25W_PCREL_PLT       __tls_get_addr@GLIBC_2.32
 5b6:   a540                    st_s    r2,[r13,0]
 5b8:   da20                    mov_s   r2,0x20
 5ba:   70c3 0000 0000          add_s   r0,r0,0
                        5bc: R_ARC_TLS_DTPOFF   e2
...

00003fd0 <.got>:
        ...
                        3fd0: R_ARC_GLOB_DAT    _ITM_deregisterTMCloneTable@Base
                        3fd4: R_ARC_GLOB_DAT    _ITM_registerTMCloneTable@Base
                        3fd8: R_ARC_GLOB_DAT    __cxa_finalize@GLIBC_2.32
                        3fdc: R_ARC_GLOB_DAT    tls_registry@Base
                        3fe0: R_ARC_TLS_DTPMOD  *ABS*
                        3fe4: R_ARC_TLS_DTPOFF  *ABS*
                        3fe8: R_ARC_TLS_DTPMOD  *ABS*
    3fec:       40 00 00 00                     .word   0x00000040
                        3fec: R_ARC_TLS_DTPOFF  *ABS*

The _tls_get_addr function gets the 0x3fe8 as input parameter. The 0x3fec got location has the offset which you may need to compensate with.

pavelvkozlov commented 1 year ago

Unfortunately, we have two examples of mutually exclusive behavior for the runtime dynlinker. In the first example https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/issues/497#issue-1493800061 the ld linker patches code with offset and also puts ti_offset value into the GOT and to correctly handle the first situation the runtime dynlinker should rewrite the offset value in the GOT, put zero as ti_offset value because of the Local Dynamic optimization. In this case common code in dynlinker like the following will be correct for the R_ARC_TLS_DTPOFF relocation:

*reloc_addr = sym->st_value + reloc->r_addend;

In the second example https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/issues/497#issuecomment-1366736529 the ld linker doesn't adds offset to the code, puts ti_offset value into the GOT and expects that the runtime dynlinker will use this value in the GOT to handle R_ARC_TLS_DTPMOD relocation. And in this case currently used code for the R_ARC_TLS_DTPOFF relocation works correctly:

*reloc_addr += sym->st_value;

Maybe I missed something, but I don't see how the dynlinker can separate one situation from another in runtime.

claziss commented 1 year ago

@pavelvkozlov I have updated the compiler to NOT generate the Local Dynamic model using references to tdata or tbss. Please let me know how is it working. Ref: https://github.com/foss-for-synopsys-dwc-arc-processors/gcc/commit/c255a88c37af6b0185e333bfa94e8f33fc99a8d2

pavelvkozlov commented 1 year ago

@claziss, I can confirm with your change all reported problems have gone. And glibc tests show no regressions. I still think that it was the static linker responsibility to correctly handle references to tdata and tbss. This change in compiler fixes reported issue. Thanks.

pavelvkozlov commented 1 year ago

This issue fiexd with https://github.com/foss-for-synopsys-dwc-arc-processors/gcc/commit/c255a88c37af6b0185e333bfa94e8f33fc99a8d2 It doesn't appear with 2022.09 and 2023.03 toolchains. Close for now. We can return to this issue if decide to return Local Dynamic model in the future.