Closed gulmezmerve closed 2 months ago
Can you verify that the address map of your core provides access to the infrastructore of tapasco-riscv
from the RAM_OFFSET
address and upwards?
Hi Thanks for the response. I have no idea how can I verify it :/ If you don't mind, I can be happy to get some guides
For the flute (32 bit), we changed the address map and also disabled the caches (return false
)
For SSITH_P2 you need to do it similarly, but change it to our 64 bit offset address.
yeah I did like that. I tried also both mem0_controller_addr_range=h_8000_0000
and h_0001_0000_0000_0000
. Both of them PE is not able to access the memory provided by host. But simple_sum examples are working. I just didn't able to get print working. It is needed for my benchmark.
diff --git a/src_Testbench/SoC/SoC_Map.bsv b/src_Testbench/SoC/SoC_Map.bsv
index da8d2d2..7e6c8b4 100644
--- a/src_Testbench/SoC/SoC_Map.bsv
+++ b/src_Testbench/SoC/SoC_Map.bsv
@@ -121,8 +121,8 @@ module mkSoC_Map (SoC_Map_IFC);
// Near_Mem_IO (including CLINT, the core-local interruptor)
let near_mem_io_addr_range = Range {
- base: 'h_0200_0000,
- size: 'h_0000_C000 // 48K
+ base: 'h_0001_0000,
+ size: 'h_0000_0000 // 48K
};
// ----------------------------------------------------------------
@@ -130,15 +130,15 @@ module mkSoC_Map (SoC_Map_IFC);
let plic_addr_range = Range {
base: 'h0C00_0000,
- size: 'h0040_0000 // 4M
+ size: 'h0000_0000 // 4M
};
// ----------------------------------------------------------------
// UART 0
let uart0_addr_range = Range {
- base: 'hC000_0000,
- size: 'h0000_0080 // 128
+ base: 'h0010_0000,
+ size: 'h7FF0_0080 // 128
};
// ----------------------------------------------------------------
@@ -158,16 +158,16 @@ module mkSoC_Map (SoC_Map_IFC);
// Boot ROM
let boot_rom_addr_range = Range {
- base: 'h_0000_1000,
- size: 'h_0000_1000 // 4K
+ base: 'h_0000_0000,
+ size: 'h_0000_8000 // 4K
};
// ----------------------------------------------------------------
// Main Mem Controller 0
let mem0_controller_addr_range = Range {
- base: 'h_8000_0000,
- size: 'h_4000_0000 // 1 GB
+ base: 'h_0001_0000_0000_0000,
+ size: 'h_8000_0000 // 1 GB
};
// ----------------------------------------------------------------
@@ -195,7 +195,7 @@ module mkSoC_Map (SoC_Map_IFC);
size: fromInteger(valueOf(RVFI_DII_Mem_Size))
};
function Bool fn_is_mem_addr (Fabric_Addr addr);
- return (inRange(rvfi_cached, addr));
+ return False;
endfunction
function Bool fn_is_IO_addr (Fabric_Addr addr);
return False;
@@ -207,10 +207,7 @@ module mkSoC_Map (SoC_Map_IFC);
// (Caches need this information to cache these addresses.)
function Bool fn_is_mem_addr (Fabric_Addr addr);
- return ( inRange(boot_rom_addr_range, addr)
- || inRange(mem0_controller_addr_range, addr)
- || inRange(tcm_addr_range, addr)
- );
+ return False;
endfunction
// ----------------------------------------------------------------
@@ -219,9 +216,7 @@ module mkSoC_Map (SoC_Map_IFC);
// (Caches need this information to avoid cacheing these addresses.)
function Bool fn_is_IO_addr (Fabric_Addr addr);
- return ( inRange(near_mem_io_addr_range, addr)
- || inRange(plic_addr_range, addr)
- || inRange(uart0_addr_range, addr));
+ return True;
endfunction
`endif
// ----------------------------------------------------------------
Are you using it on a ZynqMP-based system (ZCU102)? Then you probably also need to allocate a memory buffer from the software side, otherwise the SMMU will block the accesses.
Yes, exactly I have been using zynq102.
I updated my comment @cahz
I understand from C++ API that I can do memory allocation from the PE side.
tapasco_handle_t stdoutBuf_device;
tapasco.alloc(stdoutBuf_device, sizeof(unsigned char) * STDOUT_BUF);
I later copy from the PE with
tapasco.copy_from(stdoutBuf_device, stdoutBuf, STDOUT_BUF);
I am getting this error.
[2024-06-16T21:02:58Z ERROR tapasco::ffi] Setting LAST_ERROR: Error during Allocator operation: VFIO allocator requires va argument, none given
Error during Allocator operation: VFIO allocator requires va argument, none given
terminate called after throwing an instance of 'tapasco::tapasco_error'
what(): Error during Allocator operation: VFIO allocator requires va argument, none given
Do you have any idea, it is an old api?
I understand from C++ API that I can do memory allocation from the PE side.
Hi @gulmezmerve, just to clarify: Are you trying to use the TaPaSCo C++ API from the RISC-V core? That would be the wrong way round.
You need to allocate all necessary shared buffers from the host software running on the ZynqMP PS, in the case of the ZCU102.
You find that in the https://github.com/esa-tu-darmstadt/tapasco-riscv/blob/master/programming/examples/host/simple_sum/simple_sum_host.cpp#L56 where we allocate the array on the host side and then provide it to the RISC-V core here.
makeWrappedPointer
makes the buffer pointer available to the PE, makeOutOnly
tells the runtime that it does not have to copy anything to the buffer upon job launch, only after the job finishes. Thus, you don't have to copy the buffer back to host explicitly. The runtime already does that for you.
Passing the result as an argument to the launch writes the buffer address into the RVController ARG3
CSR which is accessed by the print function.
Just to be sure that everything is set up correctly and working as expected, could you try to run the simple_sum
example using the Piccolo core? I suggest that you modify the PE example code to:
#include "../rv_pe.h"
int main()
{
int a = readFromCtrl(ARG1);
int b = readFromCtrl(ARG2);
writeToCtrl(RETL, a + b);
initPrint();
print("Finished writing result to host.\n");
setIntr();
return 0;
}
Build the RISC-V binary and then copy it to the ZCU102 along with the simple_sum
host example.
make piccolo32_pe
tapasco compose [piccolo32_pe x 1] @ 50 MHz -p zcu102
Make sure to set the BRAM_SIZE
and the PE_ID
macros in simple_sum_host.cpp
correctly. The PE_ID
for Piccolo is 1747. The default BRAM_SIZE
as built in the previous commands is 0x8000
.
#define PE_ID 1747
#define BRAM_SIZE 0x8000
Build the host software on the ZCU102 using CMake and execute it:
/path/to/simple_sum_host /path/to/simple_sum.bin
The output should look roughlty like this:
$ ./simple_sum_host ../../simple_sum.bin
Finished reading binary file. Received 16796 bytes.
Waiting for RISC-V
RISC-V return value: 1379
First program bytes: 0
RiscV STDOUT: Finished writing result to host.
Please let me know if that helped.
Best, Yannick
Hi
@yannickl96 Thanks for the reply. I did everything that you said. I am familiar with those. But the problem is that the PE side isn't able to write stdoutBuf; it gets interrupted. https://github.com/esa-tu-darmstadt/tapasco-riscv/blob/5d1a235511fe37de134cc6e0e4e210387ea92955/programming/examples/PE/rv_pe.h#L113. This memory is not accessible to PE side. That's why I was trying to do some memory allocator PE-local memory directly, and read it from there host side.
flute32_pe and flute64_pe work for me. But I have been trying to get print working with this core. https://github.com/CTSRD-CHERI/Flute/tree/CHERI/src_SSITH_P2. PE side is not able to access to stdoutBuf
Are you able to kill the program execution from the ZCU102 or is everything freezing completely, i.e., requiring a complete cold restart of the board?
Another possibility to debug the issue is to add an ILA to your PE design and attach it to the RISC-V core's data memory port, the input and the output of the dmaOffset
core and check if the stdout buffer address is used correctly along the entire memory path. If you add an ILA, you need to build the bitstream using:
tapasco compose [your_pe x 1] @ Freq MHz -p zcu102 --features 'Debug {enabled: true}'
I can kill the execution without doing the cold start. I am not familiar with ILA at the moment. :/
Except for the print function, I can run everything, read_args, or get the return value from https://github.com/CTSRD-CHERI/Flute/tree/CHERI/src_SSITH_P2. The only problem I have now is accessing the memory provided by the host side. That's why I was checking if I can put stdout buff PE-local memory and read from the host side, if it is feasible?
In theory, you can do that. You just have to add a makeLocal
around the makeOutOnly
for the STDOUT buffer and make sure that the size of the buffer fits into your remaining data memory and the start address of the buffer is aligned to 64-bit boundaries on ZynqMP platforms. You then have to remove the + RAM_OFFSET
in rv_pe.h
since that will typically route to PE-external memory. The new job launch then looks like this:
auto job = tapasco.launch(
peID, // Processing Element ID
retval, // return value
program_buffer_in, // Program is passed as Arg 0
a, // Arg 1
b, // Arg 2
addOffset(0x6000, makeLocal(makeOutOnly(makeWrappedPointer(stdoutBuf, STDOUT_BUF)))) // Arg 3
);
With the addOffset
, we explicitly put the buffer at address 0x6000 in the PE-local memory.
Best, Yannick
Thanks for your guidance @yannickl96. It turns out that PE is not able to write its own memory local too. I defined a local array in the stack, that code doesn't work either. It is strange that the code is able to start; the simple_sum examples work. But PE isn't able to write to any memory, including its own.
int main() {
initInterrupts();
initPrint();
int a = readFromCtrl(ARG1);
int d = readFromCtrl(ARG2);
volatile char b[10];
volatile char *str = "H!\n";
for (const char *c = str; *c; ++c, ++out_idx)
b[out_idx] = *c;
writeToCtrl(RETL, a+d);
setIntr();
return 0;
}
I compiled with this command
make BRAM_SIZE=0x100000 flute64cheri_pe
tapasco compose [flute64cheri_pe x 1 ] @100 MHz -p zcu102
and later application
make SIZE=0x100000 PROGRAM=read_dm
Okay, is the inability to write to its own local memory again indicated by a trap? I overlooked that you are working with a 64-bit version. Please try to recompile your program with make SIZE=0x100000 PROGRAM=read_dm RV64=1
so your compiler uses the correct march
and mabi
flags.
Apart from that it would be really interesting to see if memory requests actually get forwarded to the data memory bus. We have two routes that we can go from here: debugging hardware with the ILA or trying to get simulation up and running. Simulation may be helpful to get the $display
statements inside the core. If the core traps we may be able to see if the problem was an error response from the memory bus or the internal address map.
On a different note: Why do you have fn_is_mem_addr
twice in your SoC_Map patch? Once w.r.t. RVFI and once w.r.t. boot_rom, TCM and mem0_controller. Is the Bluespec compiler not complaining about that?
Okay, is the inability to write to its own local memory again indicated by a trap? I overlooked that you are working with a 64-bit version. Please try to recompile your program with make SIZE=0x100000 PROGRAM=read_dm RV64=1 so your compiler uses the correct march and mabi flags.
Yes, it indicates the trap, unfortunately. I am compiling it with clang because it supports the CHERI core. I tested my clang environment with vanilla riscv-64, and it works. I don't think it is a problem to compile with clang. As a side note, I just realized that if you compile with -O2, the print function is completely optimized away. It is good to have __attribute((optnone) for tapasco print!
On a different note: Why do you have fn_is_mem_addr twice in your SoC_Map patch? Once w.r.t. RVFI and once w.r.t. boot_rom, TCM and mem0_controller. Is the Bluespec compiler not complaining about that?
one of the fn_is_mem_addr is closed by if else . https://github.com/CTSRD-CHERI/Flute/blob/3fb6e6677ac92bf87f871038302d0153b3790885/src_Testbench/SoC/SoC_Map.bsv#L209
I will try to get work with simulator, hope I can manage it.
As a side note, I just realized that if you compile with -O2, the print function is completely optimized away. It is good to have __attribute((optnone) for tapasco print!
Thank you very much for the hint!
I will try to get work with simulator, hope I can manage it.
Feel free to reach out for further assistance!
Hi;
Finally, I am able to run the questa simulator for flute64_pe, But I cannot find any example that how I will run software with on the simulator.
I am missing that part probably.
Make sure to select the correct TaPaSCo kernel-device in your software when instantiating the Tapasco Class/Structure.
As far as I understand, it shouldn't use the tlkm driver when we run it on the simulator.
Do you have an example code for host side that I give a try?
Best Merve
Hi again! The simulator interacts with the entire TaPaSCo software stack, including the TLKM. Thus, you have to load the driver on the machine running the host software (not necessarily the machine running questa). The line you quoted is only relevant if you have several TaPaSCo devices connected to the machine running your host software. If you are running on a machine without any FPGA cards connected via PCIe, your host software itself does not change for the simulation.
Thanks for the reply!
My host machine actually has FPGA cards. How should I select that it should connect to the simulator? That makes me confused about how my host application can understand that it should connect to the simulator, not the FPGA itself.
When you do ls -l /dev/ | grep tlkm
you should get several results of the form tlkm_XX
where XX
is some number. The simulation device is the one with the highest number. You need to pass this number to the Tapasco
constructor in your host application. Another possibility is to use
libtapasco_tests status
This command will print information for all devices, such as PEs in your design, etc, where you can check for the highest device ID again.
I couldn't get it working. I decided not pursing Tapasco for now.
Thanks for all reply
Hi
I have been using the Tapasco framework with SSITH_P2 core. Unfortunately, I am not able to get the print working. The PE traps when it accesses the memory provided by host https://github.com/esa-tu-darmstadt/tapasco-riscv/blob/5d1a235511fe37de134cc6e0e4e210387ea92955/programming/examples/PE/rv_pe.h#L16. It seems that the PE side doesn't have access to the host memory. I don't know how to debug this issue. Do you have any insight, or how we can verify it?