agrobman commented 4 years ago

With introduction swerv EH2 processor we need support to generate multithreaded tests. At least 1 thing must to be supported:

separate data and stack setup per thread to avoid true sharing problem;

following items are highly desired:

a) randomly generate same or different instruction stream for each thread b) shared data section for AMO instruction exercise; and means to synchronize threads to execute AMO on the same memory locations in close proximity from each other ( time wise); c) any tricky data synchronization instruction sequences d) to have ability for false sharing tests ( i.e. solely assign bytes per thread for the same half/word/double word;

Any plans for this?

Regards Alex

taoliug commented 4 years ago

Can you elaborate how thread switching is done by EH2 processor? need to figure out the right program structure to support this.

agrobman commented 4 years ago

EH2 supports hardware threading. Up to 2 harts ( hardware threads) can be run simultaneously on the same CPU. The harts can be thought as independent processors. Almost everything that defines a CPU state is independent between harts – GPRs, almost all CSRs. EH2 shares some hardware between the harts, which is does not have significant effect on test generation. ( things like internal pipe, common memory interfaces, interrupt controller, instruction cache, local memories, debug module, etc.

EH2 starts as single hart machine and this active hart can enable another hart. Both harts start from reset vector. Harts have a CSR, holding hart ID value, which is read only. EH2 support AMO instructions, but only on DCCM addresses, which can be used for data synchronization.

My current problem with riscv-dv is that I have no control on L/S instructions addresses. To be able compare simulation results with ISS we need to use separate data areas for each hart, as there is no order guarantee for L/S operations between the harts. Of course, we need to add some initialization code to enable threading, but even now we patch riscv-dv output with a perl script to bring our exception handlers, CPU init and test exit code.

taoliug commented 4 years ago

Do you use riscv-dv to generate two programs and run independently on two hardware threads? So the problem is how to control the load/store addresses between these two running programs?

agrobman commented 4 years ago

Since we use one test source per simulation, It should include code and data for both harts. The code can be the same or different, but data the harts writes need to be to separate locations

taoliug commented 4 years ago

Let me think about this and get back to you in the next few days. I think this is doable, but need to think about what's the best way to refactor the instruction generator.

taoliug commented 4 years ago

Here's the plan:

The generator will generate separate instructions and data/stack sections for each hart.
Only one program will be generated. At the beginning of the program, it will read the hart ID register and jump to the main program entry of corresponding hart. This could enable multi-harts to have the same boot fetching address.
A shared memory region will be defined for AMO instructions. All AMO instructions will be accessing this region to test synchronization between different harts.

Features that won't be supported in this release:

Multi-harts with virtual address translation enable
Multi-harts ISS simulation

Please let me know your thought about this, I will put together a few PRs in the next couple of weeks.

agrobman commented 4 years ago

The plan sounds good for me. However, I want clarify a few points:

1) It may be beneficial to have the same code for the harts to stress common hardware more; one or many code streams generated could be selected by a plusarg or randomly;

2) Need a switch to select how many harts will be simulated ( separate from 1))

3) Need to create( and use in generated code) an assembler macro to select hart specific code, which can be easy replaced by another processor specific macro with an include or postprocessing.

4) We also need to put various data segments to sections to be able to place these to different physical memories. ( For swerv we have following memories with different characteristics and need to mix L/S instructions with all these different targets:

a) External Normal RAM;

b) External I/O

c) Internal Data and Instruction RAMs

d) Internal I/O

5) Will be nice to have the similar sectioning of the code too to distribute code more for branch calculation/ICache aliasing testing .

taoliug commented 4 years ago

Sounds good to me. For 4, we currently support configure memory regions like below. You can configure them to match the size of the different memory regions of your system, and link them to match the exact memory space partition. I will add a bit to indicate if the memory region can be shared by multi-harts.

https://github.com/google/riscv-dv/blob/master/src/riscv_instr_gen_config.sv#L98

agrobman commented 4 years ago

How do I know/can control what region is used in random code? For ex, swerv can execute AMO only on DCCM, how can one constrain the test generator to use specific region in specific conditions? Or I want to run specific L/S sequence on mix of external and DCCM addresses. We designate randomly selected GPRs to be data pointers to particular address ranges and the test generator selects L/S with these base registers to create specific mixes. How can similar behavior achieved with riscv-dv?

taoliug commented 4 years ago

The load/store sequence will pick a random region from all memory regions https://github.com/google/riscv-dv/blob/master/src/riscv_load_store_instr_lib.sv#L34 If you want to fix AMO to a specific region, you can extend the sequence and fix data page id to DCCM region. I will modify AMO sequences to only access "shared" regions.

Below sequence allows interleaving L/S across multiple regions https://github.com/google/riscv-dv/blob/master/src/riscv_load_store_instr_lib.sv#L330

agrobman commented 4 years ago

Are you expecting that if one needs specific test generator behavior for their CPU testing needs , she/he supposed to learn UVM/SV and your code (which is really good, BTW ☺)? And then suffer from all future updates?

taoliug commented 4 years ago

All these tests are enabled in the regression by default, user just needs to configure the memory regions to match the actual memory map. https://github.com/google/riscv-dv/blob/master/yaml/base_testlist.yaml#L50

I do think user needs to understand a bit about how to extend the generator for some custom test requirement.

taoliug commented 4 years ago

Completed:

The generator will generate separate instructions and data/stack sections for each hart. Only one program will be generated. At the beginning of the program, it will read the hart ID register and jump to the main program entry of corresponding hart. This could enable multi-harts to have the same boot fetching address.

Working on other feature requests.

taoliug commented 4 years ago

@agrobman, the basic multi-harts infrastructure is ready, can you try if it matches your requriement?

// AMO test, two harts accessing shared memory(amo_0, amo_1 section) with AMO instructions python3 run.py -i 1 --target multi_harts -tn riscv_amo_test

// Single hart test python3 run.py -i 1 --target multi_harts -tn riscv_single_hart_test

agrobman commented 4 years ago

I don’t use python script, what plusarg does –target correspond to?

Two: is there way to create false sharing tests – where both harts use the same address space/section, but different bytes in the same DW? I.e. one address range, but each hart has it’s own set of bytes .. I need some free time to try the updates, give me a week …

taoliug commented 4 years ago

I think the scenario you mentioned could be randomly hit by the current test. I can add an AMO stress test to increase the frequency of hitting various corner cases of AMO

agrobman commented 4 years ago

I meant false sharing for regular loads and stores. Our CPU implements AMO only for internal memory, which has different data path then external memories.

agrobman commented 4 years ago

hi, I'm back,

I'm getting this:

UVM_INFO /wdc/proj/riscv/scratch/alexander.grobman/lx2p/riscv-dv/src/riscv_instr_sequence.sv(87) @ 0: reporter@@sub_5 [sub_5] Finishing instruction generation
xmsim: *E,TRNULLID: NULL pointer dereference.
          File: ./riscv-dv/src/riscv_asm_program_gen.sv, line = 83, pos = 17
         Scope: worklib.riscv_instr_pkg::riscv_asm_program_gen@4421_1.gen_program
          Time: 0 FS + 44
Verilog Stack Trace:
0: function worklib.riscv_instr_pkg::riscv_asm_program_gen@4421_1.gen_program at ./riscv-dv/src/riscv_asm_program_gen.sv:83
1: task worklib.riscv_instr_test_pkg::riscv_instr_base_test@3247_1.run_phase at ./riscv-dv/test/riscv_instr_base_test.sv:93
2: task worklib.uvm_pkg::uvm_run_phase@2147_3.exec_task at /wdc/apps/cadence/xcelium/19.03.002/tools/methodology/UVM/CDNS-1.2/sv/src/base/uvm_common_phases.svh:269
3: process in worklib.uvm_pkg::uvm_task_phase@2147_3.execute.unmblk1 at /wdc/apps/cadence/xcelium/19.03.002/tools/methodology/UVM/CDNS-1.2/sv/src/base/uvm_task_phase.svh:152

./riscv-dv/src/riscv_asm_program_gen.sv:83       main_program[hart].instr_cnt = cfg.main_program_instr_cnt;
xcelium> exit

for :

xrun -R -svseed 1616721718 +UVM_TESTNAME=riscv_instr_base_test +num_of_tests=1 +asm_file_name=test +instr_cnt=1000 +bare_program_mode=1 +fix_sp +num_of_harts=2 +no_ebreak=0 -xceligen rand_struct I had to add this -xceligen rand_struct because Xcelium complained that it does not support random structures (?)

What do I miss?

( I can't run you python script - have no latest python installed)

agrobman commented 4 years ago

OK, I've found missing setup: NUM_HARTS=2 couple thoughts to this: 1) Why do I need this param? why isn't the plusparam num_of_harts sufficient? 2) Why the generator fails with Null pointer and doesn't check validity of provided num_of_harts vs NUM_HARTS?

agrobman commented 4 years ago

BTW, FYI,

these are modifications of the generated source I have to do to make the tests simulate:

    $_ = '' if /.include/;
    s/.globl _start//;              # remove already defined
    s/main:/     /;
    s/^_start:/main:/;
    $_ = '' if /\s+li\s+x2,/;
    $_ = '' if /user_stack_end/;
    s/.section .text/.text/;        # setup "known default section
    # replace exit code
    s/j write_tohost//; 
    s/test_done:/END_TEST(0)/;
    # remove defined labels/sections
    $block = 1 if /.pushsection (.tohost|.user_stack)/;
    print TEST $_ if !(/host/ || $block); 
    $block = 0 if /.popsection/;

These modifications are for single threaded tests - Seems I have to update my script for MT tests ...

Worst part of this that I have to update my script with every git pull I do.

Is there something we can do to avoid this hassle?

Also I add this at the beginning:

add test environment macros, exception handlers etc.

print TEST <<EOF;
// test name is $test
//$run_cmd
#define RISCVDV
#include "pm/riscv_test.h"
START_TEST

EOF

agrobman commented 4 years ago

I'm getting this: vascom: Executing /wdc/apps/riscv/bin/riscv64-unknown-elf-as -march=rv32gc test.cpp.s -o test.o vascom: Running ld on test.o (test) [+] .. test.o: In function .L111': (.text+0x104): relocation truncated to fit: R_RISCV_JAL againsth1_start'

You may need to generate long jump, instead of branch ...

taoliug commented 4 years ago

OK, I've found missing setup: NUM_HARTS=2 couple thoughts to this:

Why do I need this param? why isn't the plusparam num_of_harts sufficient?

Why the generator fails with Null pointer and doesn't check validity of provided num_of_harts vs NUM_HARTS?

The intention is to set NUM_HARTS to the actual hart count, and cfg.num_of_harts to be the number of harts used in test generation. You could generate a single hart program even there're two harts in the processor.

taoliug commented 4 years ago

BTW, FYI,

these are modifications of the generated source I have to do to make the tests simulate:
    $_ = '' if /.include/;
    s/.globl _start//;              # remove already defined
    s/main:/     /;
    s/^_start:/main:/;
    $_ = '' if /\s+li\s+x2,/;
    $_ = '' if /user_stack_end/;
    s/.section .text/.text/;        # setup "known default section
    # replace exit code
    s/j write_tohost//; 
    s/test_done:/END_TEST(0)/;
    # remove defined labels/sections
    $block = 1 if /.pushsection (.tohost|.user_stack)/;
    print TEST $_ if !(/host/ || $block); 
    $block = 0 if /.popsection/;
These modifications are for single threaded tests - Seems I have to update my script for MT tests ...

Worst part of this that I have to update my script with every git pull I do.

Is there something we can do to avoid this hassle?

Also I add this at the beginning:

add test environment macros, exception handlers etc.
print TEST <<EOF;
// test name is $test
//$run_cmd
#define RISCVDV
#include "pm/riscv_test.h"
START_TEST

EOF

Can you try to run with +bare_program_mode=1 to see if you get a shorter list to process?

agrobman commented 4 years ago

I do run with bare_mode:

//run.int xrun -R -svseed 3751172817 -svrnc rand_struct +UVM_TESTNAME=riscv_instr_base_test +num_of_tests=1 +asm_file_name=test +instr_cnt=1000 +bare_program_mode=1 +fix_sp=1 +num_of_harts=2 +illegal_instr_ratio=50 >& /dev/null

Alex

agrobman commented 4 years ago

to assembly compilation error: (.text+0x104): relocation truncated to fit: R_RISCV_JAL against `h1_start' /wdc/apps/riscv/bin/riscv64-unknown-elf-ld: final link failed: Symbol needs debug section which does not exist

I think you're missing text section for hart1 code

Also use long jamps to reach hartN code

agrobman commented 4 years ago

where is the end of hart1 code?

taoliug commented 4 years ago

Have you set NUM_HARTS = 2 as well as num_of_harts = 2? It seems working for me.

h1_region_0 is the starting point of hart_1 data section, also the end point of hart 1 instruction section. I guess it might be better to move all hart_0/hart_1 instruction section together so it's easy to load to instruction memory?

agrobman commented 4 years ago

I was asking about end of h1 'main' to end h1 simulation.

" Have you set NUM_HARTS = 2 as well as num_of_harts = 2? " I did, but to reduce confusion, it will be nice to have only one variable to set.

Separate problem is that now h1 code is placed to .data ...

agrobman commented 4 years ago

I guess the end if the h1 main is here (?)

1784:             c.nop
                  c.slli     s6, 14
                  csrrci     t3, 0x340, 15
1787:             slti       t3, ra, -715 <<<<<<<<<<<< end of main
h1_sub_9:         div        a2, s9, a4
                  slt        s0, sp, s5
                  c.nop
                  addi       sp, sp, -52
                  c.addi     s1, 29
                  sra        s8, a2, s0
                  sw         ra, 4(sp)
                  xori       s9, s7, -960

the hartN need to report that they finished similar to hart0 ...

agrobman commented 4 years ago

Are you calling subs of one hart in main of other? Seems when this happens the subs may have absolute data references - so the harts access the same data locations, what in turn causes data related failures in our simulations. Please, make sure that shared subs don't setup data pointers. ( don't use something like 'la xN, hN_region[+-] const')

agrobman commented 4 years ago

post processing code is now:

foreach (@src){
    $_ = '' if /.include/;
    s/.globl _start//;              # remove already defined
    #setup stack earlier
#    s/^_start:/    la sp, _user_stack_end/;
    s/\bmain:/     /;
    s/^_start:/main:/;
    $_ = '' if /\s+li\s+x2,/;
    $_ = '' if /user_stack_end/;
    s/.section .text/.text/;        # setup "known default section
    # replace exit code
    s/j write_tohost//; 
    s/test_done:/END_TEST(0)/;
    s/h1_start:/.text\nh1_start:/;
    if(/^h1_sub\S+:/ && !$h1_end){
        $h1_end = 1;
        $_ = "j end_the_test\n" .$_;
    }
    # remove defined labels/sections
    if (/.pushsection (.tohost|\S+_user_stack)/){
        $block = 1;
    }
    elsif(/region/ && s/\.pushsection.+//){
        $real_section = 1;
        $sh = sprintf "%x", $sn;
        $_ = ".section .data_gen_$sn START = 0x${sh}0000000\n";
        $sn++;
        $sn = 10 if $sn == 9; # bypass HOLE
    }
    elsif($real_section && /popsection/){
        $_ = ".data\n";
        $real_section = 0;
    }
    print TEST $_ if !(/host/ || $block); 
    $block = 0 if /.popsection/;

}

taoliug commented 4 years ago

Are you calling subs of one hart in main of other? Seems when this happens the subs may have absolute data references - so the harts access the same data locations, what in turn causes data related failures in our simulations. Please, make sure that shared subs don't setup data pointers. ( don't use something like 'la xN, hN_region[+-] const')

This is needed because the load/store is always accessing initialized data regions. It cannot use random address as it could point to any undefined memory region. I think you can load the data region to you memory to run the simulation. You can skip this if the region is some memory mapped IO.

taoliug commented 4 years ago

I guess the end if the h1 main is here (?)

1784:             c.nop
                  c.slli     s6, 14
                  csrrci     t3, 0x340, 15
1787:             slti       t3, ra, -715 <<<<<<<<<<<< end of main
h1_sub_9:         div        a2, s9, a4
                  slt        s0, sp, s5
                  c.nop
                  addi       sp, sp, -52
                  c.addi     s1, 29
                  sra        s8, a2, s0
                  sw         ra, 4(sp)
                  xori       s9, s7, -960

the hartN need to report that they finished similar to hart0 ...

This is solved by #495 , you can find h*_instr_end for each hart. Besides, the instruction sections for all harts are moved together to avoid putting hart1 instructions in data section.

agrobman commented 4 years ago

We do load all data sections to the simulation model memories. The problem is true data sharing when two harts access the same memory location simultaneously. The hardware doesn’t guarantee any access order for true sharing. We have no mechanisms to check data correctness in these cases and the tests fail.

In initial request I asked to avoid true sharing. Now if the harts can call subroutines used direct data references, the true sharing does happen.

To fix this the test generator shouldn’t or use direct data references in shared subroutines or use shared subroutines.

agrobman commented 4 years ago

I asked about end of MAIN for hart1, not about end of h1 code. Now as it implemented, h1 finishes its main and starts execute first subroutine, and then returns to nowhere …

I guess the end if the h1 main is here (?)

1784: c.nop

              c.slli     s6, 14

              csrrci     t3, 0x340, 15

1787: slti t3, ra, -715 <<<<<<<<<<<< end of main

h1_sub_9: div a2, s9, a4

              slt        s0, sp, s5

              c.nop

              addi       sp, sp, -52

              c.addi     s1, 29

              sra        s8, a2, s0

              sw         ra, 4(sp)

              xori       s9, s7, -960

the hartN need to report that they finished similar to hart0 ...

This is solved by #495https://github.com/google/riscv-dv/pull/495 , you can find h*_instr_end for each hart. Besides, the instruction sections for all harts are moved together to avoid putting hart1 instructions in data section.

taoliug commented 4 years ago

We do load all data sections to the simulation model memories. The problem is true data sharing when two harts access the same memory location simultaneously. The hardware doesn’t guarantee any access order for true sharing. We have no mechanisms to check data correctness in these cases and the tests fail. In initial request I asked to avoid true sharing. Now if the harts can call subroutines used direct data references, the true sharing does happen. To fix this the test generator shouldn’t or use direct data references in shared subroutines or use shared subroutines.

Only AMO test has true data sharing. For other load/store instructions, each hart will access its own data sections. Hart 0 will access h0_region_0...h0_region_n and Hart 1 will access h1_region_0...h1_region_n. For AMO test sequence, both harts will access amo_region_0...amo_region_n as it's intended to test true data sharing.

taoliug commented 4 years ago

I asked about end of MAIN for hart1, not about end of h1 code. Now as it implemented, h1 finishes its main and starts execute first subroutine, and then returns to nowhere … I guess the end if the h1 main is here (?) 1784: c.nop c.slli s6, 14 csrrci t3, 0x340, 15 1787: slti t3, ra, -715 <<<<<<<<<<<< end of main h1_sub_9: div a2, s9, a4 slt s0, sp, s5 c.nop addi sp, sp, -52 c.addi s1, 29 sra s8, a2, s0 sw ra, 4(sp) xori s9, s7, -960 the hartN need to report that they finished similar to hart0 ... This is solved by #495<#495> , you can find h*_instr_end for each hart. Besides, the instruction sections for all harts are moved together to avoid putting hart1 instructions in data section.

In bare program mode, the main program ends by jumping to the write_to_host section https://github.com/google/riscv-dv/blob/master/src/riscv_asm_program_gen.sv#L516 It could be changed to trigger an ecall exception so that your own trap handler can decide how to proceed.

agrobman commented 4 years ago

“

Only AMO test has true data sharing. For other load/store instructions, each hart will access its own data sections. Hart 0 will access h0_region_0...h0_region_n and Hart 1 will access h1_region_0...h1_region_n. For AMO test sequence, both harts will access amo_region_0...amo_region_n as it's intended to test true data sharing.

“

Are you saying that harts do not cross access each other data?

I’m getting true data sharing failures because hart1 main calls hart0 subroutines and vice versa and these subroutines access data of the corresponded opposite hart ,

Please make sure that these cross called subroutines do not access the data ..

agrobman commented 4 years ago

Hi,

We need to avoid these calls:

h1_main:          csrrwi     a4, 0x340, 3
                  sra        s6, s3, t4
                  csrrsi     s1, 0x340, 25
                  la         s9, h0_sub_3 <<<<<<<<<<<<<
                  div        a3, s2, ra
                  addi       s9, s9, -365
                  c.andi     a4, 12
h0_j_h1_main_h0_sub_3_2:jalr       s0, s9, 366 <<<<<<<<<<<<<<
                  la         s8, h0_sub_1 <<<<<<<<<<
                  csrrw      t1, 0x340, zero
                  sub        zero, a1, t6
                  and        t5, s4, tp
                  addi       s8, s8, 683
                  mul        s3, a5, s7
                  mul        tp, s10, a5
                  lui        t6, 82387
h0_j_h1_main_h0_sub_1_4:jalr       s0, s8, -683 <<<<<<
                  csrrs      s0, 0x340, s5
                  fence
                  la         a5, h0_sub_2 <<<<<<<<<<<<
                  addi       a5, a5, -887
                  add        s2, t0, zero
                  lui        a1, 976418
                  bne        s4, s5, h0_j_h1_main_h0_sub_2_5 #branch to jump instr
                  srai       s0, t4, 19

These cross harts calls cause true sharing problems with regular L/S instructions...

If a called subroutine sets up data pointers. There is possibility that inter hart sub calls will cause that both harts write the same location or one hart writes and another reads the same data location. Our reference model can't predict outcome!!

Both harts can execute the same code only if it uses preset data pointers to different data locations, or only AMO instructions are used.

I guess this cross calling is caused by usage of single array of the subroutine names ...

taoliug commented 4 years ago

Hi,

We need to avoid these calls:
h1_main:          csrrwi     a4, 0x340, 3
                  sra        s6, s3, t4
                  csrrsi     s1, 0x340, 25
                  la         s9, h0_sub_3 <<<<<<<<<<<<<
                  div        a3, s2, ra
                  addi       s9, s9, -365
                  c.andi     a4, 12
h0_j_h1_main_h0_sub_3_2:jalr       s0, s9, 366 <<<<<<<<<<<<<<
                  la         s8, h0_sub_1 <<<<<<<<<<
                  csrrw      t1, 0x340, zero
                  sub        zero, a1, t6
                  and        t5, s4, tp
                  addi       s8, s8, 683
                  mul        s3, a5, s7
                  mul        tp, s10, a5
                  lui        t6, 82387
h0_j_h1_main_h0_sub_1_4:jalr       s0, s8, -683 <<<<<<
                  csrrs      s0, 0x340, s5
                  fence
                  la         a5, h0_sub_2 <<<<<<<<<<<<
                  addi       a5, a5, -887
                  add        s2, t0, zero
                  lui        a1, 976418
                  bne        s4, s5, h0_j_h1_main_h0_sub_2_5 #branch to jump instr
                  srai       s0, t4, 19
These cross harts calls cause true sharing problems with regular L/S instructions...

If a called subroutine sets up data pointers. There is possibility that inter hart sub calls will cause that both harts write the same location or one hart writes and another reads the same data location. Our reference model can't predict outcome!!

Both harts can execute the same code only if it uses preset data pointers to different data locations, or only AMO instructions are used.

I guess this cross calling is caused by usage of single array of the subroutine names ...

This is solved now. There was a bug in jumping to sub programs of other harts. Anything else needs to be fixed for multi-threading support?

agrobman commented 4 years ago

what are two AMO sections for?

I see that AMO test uses only 2 locations : amo_0+0 amo_1+1

                  la         a2, amo_0+0 #start riscv_lr_sc_instr_stream_1
                  c.andi     a3, 22
                  lr.w.aq     t1, (a2)
                  ori        t6, t5, -819
                  c.xor      a0, a0
                  csrrs      s5, 0x340, a0
                  sub        gp, a0, s5
                  sc.w.aq     t2, s9, (a2) #end riscv_lr_sc_instr_stream_1
                  la         s2, amo_1+0 #start riscv_amo_instr_stream_3
                  amomin.w.aq s0, t4, (s2)
                  amomin.w.aq a1, t3, (s2)
                  amomin.w.aq a3, s5, (s2)

taoliug commented 4 years ago

amo_* are data sections shared among all harts.
AMO sequences are injected with runtime options https://github.com/google/riscv-dv/blob/master/target/multi_harts/testlist.yaml#L65

    +directed_instr_0=riscv_lr_sc_instr_stream,10
    +directed_instr_1=riscv_amo_instr_stream,10

agrobman commented 4 years ago

Amo tests (especially for LR/SC pairs need to create conditions for SC success and failures) there are a few cases when SC fails SC without LR, other than SC store touches this location etc

different aq, rl bits ( although EH2 ignores these bits)

agrobman commented 4 years ago

why do we need two AMO sections? single section should be sufficient ..

taoliug commented 4 years ago

why do we need two AMO sections? single section should be sufficient ..

This is configurable: https://github.com/google/riscv-dv/blob/master/src/riscv_instr_gen_config.sv#L110

agrobman commented 4 years ago

We need an option to run the same code by two harts to stress common CPU pipe logic ( BTW, we had found a lot of the pipe logic bugs with such tests)

agrobman commented 4 years ago

"this is configurable"

I really don't want to modify your code! I don't have time/understanding of the generator code to maintain my customizations.

Also these sections are too big to create any probable L/S collisions if the CPU accesses are evenly spread over these sections, besides, you only use two addresses - amo_0 and amo_1 in your test.

Also the CPU should eventually access amo locations with regular load/stores to create cases I was talking about above

agrobman commented 4 years ago

Hi,

I'm still getting true sharing failures. Hart1 code accesses hart0 data:

**h1_main_18_0_t:**   remu       s1, s11, gp
                  fence
                  ori        a1, zero, 135
...

                   c.addi     tp, 7
                  lh         s6, 28(t1)
                  slt        a3, sp, s0
                  lb         a3, 43(t1)
                  lhu        s8, 52(t1) #end riscv_load_store_rand_instr_stream_3
                  **la         a2, h0_region_0+3682 #start load_stor**e_instr_stream_2
                  sb         a5, -7(a2)
                  la         a0, h0_region_1+12910 #start load_store_instr_stream_1
                  sb         t0, -5(a2)
                  la         s2, h0_region_4+3572 #start load_store_instr_stream_0
                  lhu        gp, 14(a0)
                  lh         t5, 12(a2)
                  sb         a5, -18(s2)

the generator was called as:

// test name is rand_instr_test
//run.int xrun  -licqueue -R -svseed 4192922862 -svrnc rand_struct +UVM_TESTNAME=riscv_instr_base_test +num_of_tests=1 +asm_file_name=test +instr_cnt=2000 +bare_program_mode=1 +fix_sp=1 +num_of_harts=2   +num_of_sub_program=5 +directed_instr_0=riscv_load_store_rand_instr_stream,4 +boot_mode=m +directed_instr_1=riscv_loop_instr,4  +directed_instr_2=riscv_hazard_instr_stream,4  +directed_instr_3=riscv_load_store_hazard_instr_stream,4  +directed_instr_4=riscv_multi_page_load_store_instr_stream,4  +directed_instr_5=riscv_mem_region_stress_test,4  >& /dev/null

taoliug commented 4 years ago

Amo tests (especially for LR/SC pairs need to create conditions for SC success and failures) there are a few cases when SC fails SC without LR, other than SC store touches this location etc

different aq, rl bits ( although EH2 ignores these bits)

This will be added in a later PR.

Other remaining feature requests:

Regular load/store to access the shared memory region
Duplicate program/data for all harts.

agrobman commented 4 years ago

“

Other remaining feature requests:

Regular load/store to access the shared memory region
Duplicate program/data for all harts. “ I’m not sure about these two. For 1) there should be a special protocol(s) with synchronization of the threads – plain true sharing will fail in simulations. Although, false sharing should work. The false sharing to be effective the code should be able to bring the harts to access the same locations simultaneously - again some sort of synchronization is required.

For 2) Not sure what Duplicate program/Data for all harts mean – I asked to force all harts to run the same code with different data …

chipsalliance / riscv-dv

add support for multi-threading #462

add test environment macros, exception handlers etc.

add test environment macros, exception handlers etc.