bespoke-silicon-group / bsg_manycore

Tile based architecture designed for computing efficiency, scalability and generality
Other
225 stars 58 forks source link

Volatile atomic #682

Open dpetrisko opened 1 year ago

dpetrisko commented 1 year ago

I've been encountering an issue mixing loads and amos to the same address. I've convinced myself it's a coding issue rather than a hardware issue, with the compiler not tracking the inline assembly dependencies correctly. Making the pointer volatile seems to make the accesses sane. But then compiling with C++ -fno-permissive, causes the following:

main.c:27:24: error: invalid conversion from 'volatile int*' to 'int*' [-fpermissive]
   27 |     int y = bsg_amoadd(vol_lock_ptr, 1);
      |                        ^~~~~~~~~~~~
      |                        |
      |                        volatile int*

This PR makes the pointer parameter for the bsg_amo_add volatile so that we can compile this code correctly. It also adds a test demonstrating the issue

dpetrisko commented 1 year ago

@mrutt92 foresee any issues doing this?

mrutt92 commented 1 year ago

It seems reasonable to me. Adding the volatile modifier to the pointer is typical for defining functions like these.

If you're writing c++ I encourage using std::atomic if you can... then you don't need to think about this volatile nonsense.

Could you post some example C and assembly showing the bug?

dpetrisko commented 1 year ago

Ya the test added will fail if you switch out "int lock_ptr = &lock;" for "volatile int lock_ptr = &lock;"

#include "bsg_manycore.h"
#include "bsg_set_tile_x_y.h"
#include "bsg_manycore_atomic.h"

int lock __attribute__ ((section (".dram"))) = {0};

int main()
{

  bsg_set_tile_x_y();

  if (__bsg_id == 0)
  {
    // Using a regular int* here cause an illegal ordering resulting in:
    //   x = 1, y = 0, z = 1
    //int* lock_ptr = &lock;
    volatile int* lock_ptr  = &lock;
    int x = *lock_ptr;
    int y = bsg_amoadd(lock_ptr, 1);
    int z = *lock_ptr;

    if (x != 0 || y != 0 || z != 1) bsg_fail();

    bsg_finish();
  }

  bsg_wait_while(1);
}
dpetrisko commented 1 year ago

This might be more informative, you can see the difference in ASM here:

#include "bsg_manycore.h"
#include "bsg_set_tile_x_y.h"
#include "bsg_manycore_atomic.h"

int nvol_lock __attribute__ ((section (".dram"))) = {0};
int vol_lock __attribute__ ((section (".dram"))) = {0};

int main()
{

  bsg_set_tile_x_y();

  if (__bsg_id == 0)
  {
    int* nvol_lock_ptr = &nvol_lock;
    int q = *nvol_lock_ptr;
    int r = bsg_amoadd(nvol_lock_ptr, 1);
    int s = *nvol_lock_ptr;

    volatile int* vol_lock_ptr  = &vol_lock;
    int x = *vol_lock_ptr;
    int y = bsg_amoadd(vol_lock_ptr, 1);
    int z = *vol_lock_ptr;

    bsg_print_hexadecimal(q);
    bsg_print_hexadecimal(r);
    bsg_print_hexadecimal(s);

    bsg_print_hexadecimal(x);
    bsg_print_hexadecimal(y);
    bsg_print_hexadecimal(z);

    bsg_finish();
  }

  bsg_wait_while(1);
}
000001c8 <main>:
 1c8:   ff010113            addi    sp,sp,-16
 1cc:   00112623            sw  ra,12(sp)
 1d0:   f59ff0ef            jal ra,128 <bsg_set_tile_x_y>
 1d4:   02c02283            lw  t0,44(zero) # 2c <__bsg_id>
 1d8:   04029e63            bnez    t0,234 <main+0x6c>
 1dc:   81000737            lui a4,0x81000
 1e0:   06070093            addi    ra,a4,96 # 81000060 <_bsg_dram_end_addr+0xfffffff0>
 1e4:   00100313            li  t1,1
 1e8:   0060a52f            amoadd.w    a0,t1,(ra)
 1ec:   0040a583            lw  a1,4(ra)
 1f0:   00408693            addi    a3,ra,4
 1f4:   0066a3af            amoadd.w    t2,t1,(a3)
 1f8:   0000a603            lw  a2,0(ra)
 1fc:   4010f837            lui a6,0x4010f
 200:   0040a883            lw  a7,4(ra)
 204:   aec82423            sw  a2,-1304(a6) # 4010eae8 <_bsg_elf_vcache_size+0x400ceae8>
 208:   aea82423            sw  a0,-1304(a6)
 20c:   aec82423            sw  a2,-1304(a6)
 210:   aeb82423            sw  a1,-1304(a6)
 214:   ae782423            sw  t2,-1304(a6)
 218:   af182423            sw  a7,-1304(a6)
 21c:   03002e83            lw  t4,48(zero) # 30 <__bsg_y>
 220:   03402f83            lw  t6,52(zero) # 34 <__bsg_x>
 224:   010e9793            slli    a5,t4,0x10
 228:   01f782b3            add t0,a5,t6
 22c:   ac582823            sw  t0,-1328(a6)
 230:   0000006f            j   230 <main+0x68>
 234:   0000006f            j   234 <main+0x6c>