foss-for-synopsys-dwc-arc-processors / toolchain

Repository containing releases of prebuilt GNU toolchains for DesignWare ARC Processors from Synopsys (available from "releases" link below).
http://www.synopsys.com/IP/ProcessorIP/ARCProcessors/Pages/default.aspx
GNU General Public License v3.0
92 stars 48 forks source link

Uncached data access is generated instead of cached for ARC600 #287

Open petrokarashchenko opened 4 years ago

petrokarashchenko commented 4 years ago

I have next code snippet

#include <stdint.h>

static uint8_t data[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 };

int test(uint8_t *my_ptr, volatile uint8_t *v_ptr) {
    for (int i=0; i<10; i++) {
        if (my_ptr) {
            my_ptr[i] = data[i];
        } else {
            v_ptr[i] = data[i];
        }
    }
    return 0;
}

I compile it with next options with toolchain 2020.03:

-mcpu=arc600 -mtune=arc600 -mbig-endian -mmul64 -mbarrel-shifter -ffunction-sections -fdata-sections -fno-common -fno-strict-aliasing -c -Os -mno-volatile-cache

I'm getting next output:

   1                    .file   "test.c"
   2                    .cpu ARC600
   3                    .arc_attribute Tag_ARC_PCS_config, 2
   4                    .arc_attribute Tag_ARC_ABI_rf16, 0
   5                    .arc_attribute Tag_ARC_ABI_pic, 0
   6                    .arc_attribute Tag_ARC_ABI_tls, 0
   7                    .arc_attribute Tag_ARC_ABI_sda, 2
   8                    .arc_attribute Tag_ARC_ABI_exceptions, 0
   9                    .section    .text
  10                    .section    .text.test,"ax",@progbits
  11                    .align 4
  12                    .global test
  14                test:
  15 0000 DA00              mov_s   r2,0    ;0
  16                    .align 2
  17                .L4:
  18 0002 2232 0F83         ldb r3,[r2,@.LANCHOR0]
  18      0000 0000 
  19 000a 605C              add_s r12,r0,r2 ;1
  20 000c E882              brne_s r0, 0, @.L6
  21 000e 615C              add_s r12,r1,r2 ;1
  22                    .align 2
  23                .L6:
  24 0010 6A41              add_s r2,r2,1 ;1
  25 0012 1C00 10E2         stb.di r3,[r12]
  26 0016 0AEF 8291         brne r2, 10, @.L4
  27 001a D800              mov_s   r0,0    ;0
  28 001c 7EE0              j_s [blink]
  30 001e 78E0              .section    .rodata.data,"a"
  31                    .align 4
  32                    .set    .LANCHOR0,. + 0
  35                data:
  36 0000 0102 0304         .string "\001\002\003\004\005\006\007\b\t"
  36      0506 0708 
  36      0900 
  37                    .ident  "GCC: (ARCompact/ARCv2 ISA elf32 toolchain 2020.03) 9.3.1 20200315"
  38 000a 0000              .section    .note.GNU-stack,"",@progbits
DEFINED SYMBOLS
                            *ABS*:0000000000000000 test.c
     /tmp/cc59xysx.s:14     .text.test:0000000000000000 test
     /tmp/cc59xysx.s:35     .rodata.data:0000000000000000 data

NO UNDEFINED SYMBOLS

The issue is that I'm expecting to see both uncached (stb.di) and cached (stb) access, but only uncached access is generated.

claziss commented 4 years ago

That is a good observation, however, the implementation of this direct access is just a piggyback on volatile flag. Hence, the compiler optimizes the stores (which is correct) into the volatile one. A work around is to use a very low optimization level like Og for this function like this:

__attribute__ ((optimize("Og")))
int test(uint8_t *  my_ptr,
     volatile uint8_t * v_ptr)

This is a good example which shows us that a proper uncached directive implementation should use memory spaces concept. In your example the two write to memory operations are not equivalent because they are happening in different memory spaces. However, the compiler thinks that both writes are in the same unitary memory space.

Another observation if you use -Os with -fdata-sections will result in worst code size due to section anchors which are enabled by default when compiling with Os. So, you use either data-sections or section-anchors (i.e. -fno-section-anchors).

petrokarashchenko commented 4 years ago

@claziss I was just developing code with workaround suggested in https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/issues/246 because I can't use __attribute__((uncached)) with a full power. This is one of the findings :) But potentially there might be many "Easter eggs" due to -mno-volatile-cache.

Thanks for the hint about -Os and -fdata-sections. I will read more about -fno-section-anchors. The reason why I'm using -fdata-sections is that I use it in pair with -ffunction-sections and -Wl,--gc-sections to reduce binary size

abrodkin commented 4 years ago

@claziss is there anything we might want to do here? @petrokarashchenko I guess this one is not blocking you, or does it?

claziss commented 4 years ago

Probably a proper fix will slip to the next release.

abrodkin commented 4 years ago

@claziss do you mean arc-2021.03?

petrokarashchenko commented 4 years ago

@abrodkin I think we are looking for https://github.com/foss-for-synopsys-dwc-arc-processors/toolchain/issues/246 fix actually. This issue comes as a result of a workaround that we are currently using. For now we have changed the code to avoid such optimization. But we really looking forward to start using uncached attribue.