Mix of .di and non .di accesses for std::atomic with -mno-volatile-cache/-mvolatile-di

foss-for-synopsys-dwc-arc-processors / toolchain

Repository containing releases of prebuilt GNU toolchains for DesignWare ARC Processors from Synopsys (available from "releases" link below).

http://www.synopsys.com/IP/ProcessorIP/ARCProcessors/Pages/default.aspx

GNU General Public License v3.0

92 stars 48 forks source link

Mix of .di and non .di accesses for std::atomic with -mno-volatile-cache/-mvolatile-di #488

Open displaylink-mdomzal opened 1 year ago

displaylink-mdomzal commented 1 year ago

We have noticed that trying to use std::atomic class in code which is compiled with cache bypass instructions for volatile keyword (-mno-volatile-cache or -mvolatile-di switches depending on the toolchain version) turned on can generate improper machine code. It is visible for ARC600, HS38 (both on 2021.09 baremetal toolchain) and HS58 (prerelease 2022.07 toolchain).

For details see attachment with simple code sample, compilation commands and resulting listing files (also without mentioned above compiler switches for comparison). In short the problem looks like that (based on ARC625 listing as it is simplest to analyze):

 2f2:   da00                    mov_s   r2,0
 2f4:   1bfb b082               stb r2,[fp,-5]                  // Cached write of 0 value to the atomic (fp - 5)
 2f8:   da05                    mov_s   r2,0x5
 2fa:   1bfc b080               st  r2,[fp,-4]
 2fe:   2342 3142               sub r2,fp,0x5                   // r2 = fp - 5
 302:   1200 0883               ldb.di  r3,[r2]                     // Uncached read of the atomic (r3 = uninitialized byte)
 306:   dc01                    mov_s   r12,0x1
 308:   1a00 0322               stb.di  r12,[r2]
 30c:   7a6f                    extb_s  r2,r3                       // r2 = unitialized byte (as r3 is byte already)
 30e:   78e0                    nop_s
 310:   7a4b                    tst_s   r2,r2                       // Set flag based on unitialized value
 312:   f204                    beq_s   8   ;318 <main+0x30>    // Branch based on flag
 314:   da0a                    mov_s   r2,0xa
 316:   f003                    b_s 6   ;31a <main+0x32>
 318:   da0b                    mov_s   r2,0xb
 31a:   7048                    mov_s   r0,r2

Without -mno-volatile-cache/-mvolatile-di the code is generated without .di instructions and works properly. We see similar mix of cached/non cached accesses happening for HS38 and HS58 in the sample (although due to more complicated listing I have not analyzed them for the provided sample).

atomic_issue.zip