Open Mecrisp opened 4 years ago
Thanks !
It is a great article ! Very inspiring.
I tried to compare LUTs for decoder.v with mini_decoder.v as-is and found it to save 21 LUTs with my (old) version of Yosys. I changed it a little bit more, using casez and ? as attached with a result of 25 LUTs saved.
/********************* Instruction decoder *******************************/
/* A drop-in replacement of the instruction decoder, meant to further */
/* reduce LUT count by not checking for errors (but no success for now) */
module NrvDecoder(
input wire [31:0] instr,
output wire [4:0] writeBackRegId,
output reg writeBackEn,
output reg [3:0] writeBackSel, // 0001: ALU 0010: PC+4 0100: RAM 1000: counters
// (could use 2 wires instead, but using 4 wires (1-hot encoding)
// reduces both LUT count and critical path in the end !)
output wire [4:0] inRegId1,
output wire [4:0] inRegId2,
output reg aluSel, // 0: force aluOp,aluQual to zero (ADD) 1: use aluOp,aluQual from instr field
output reg aluInSel1, // 0: reg 1: pc
output reg aluInSel2, // 0: reg 1: imm
output [2:0] aluOp,
output reg aluQual,
output wire aluM, // Asserted if operation is an RV32M operation
output reg isLoad,
output reg isStore,
output reg needWaitALU,
output reg [2:0] nextPCSel, // 001: PC+4 010: ALU 100: (predicate ? ALU : PC+4)
// (same as writeBackSel, 1-hot encoding)
output reg [31:0] imm,
output wire error
);
assign error = 1'b0; // We do not check for errors in the MiniDecoder.
assign aluM = 1'b0; // MiniDecoder only works for RV32I
reg inRegId1Sel; // 0: force inRegId1 to zero 1: use inRegId1 instr field
assign writeBackRegId = instr[11:7];
assign inRegId1 = instr[19:15] & {5{inRegId1Sel}}; // Internal sig InRegId1Sel used to force zero in reg1
assign inRegId2 = instr[24:20]; // (because I'm making maximum reuse of the adder of the ALU)
assign aluOp = instr[14:12];
wire [31:0] Iimm = {{21{instr[31]}}, instr[30:20]};
wire [31:0] Simm = {{21{instr[31]}}, instr[30:25], instr[11:7]};
wire [31:0] Bimm = {{20{instr[31]}}, instr[7], instr[30:25], instr[11:8], 1'b0};
wire [31:0] Jimm = {{12{instr[31]}}, instr[19:12], instr[20], instr[30:21], 1'b0};
wire [31:0] Uimm = {instr[31], instr[30:12], {12{1'b0}}};
// The rest of instruction decoding, for the following signals:
// writeBackEn
// writeBackSel 0001: ALU 0010: PC+4 0100: RAM 1000: counters
// inRegId1Sel 0: zero 1: regId
// aluInSel1 0: reg 1: PC
// aluInSel2 0: reg 1: imm
// aluQual +/- SRLI/SRAI
// aluM 1 if instr is RV32M
// aluSel 0: force aluOp,aluQual=00 1: use aluOp/aluQual
// nextPCSel 001: PC+4 010: ALU 100: (pred ? ALU : PC+4)
// imm (select one of Iimm,Simm,Bimm,Jimm,Uimm)
// We need to distingish shifts for two reasons:
// - We need to wait for ALU when it is a shift
// - For ALU ops with immediates, aluQual is 0, except
// for shifts (then it is instr[30]).
wire aluOpIsShift = (aluOp == 3'b001) || (aluOp == 3'b101);
always @(*) begin
nextPCSel = 3'b001; // default: PC <- PC+4
inRegId1Sel = 1'b1; // reg 1 Id from instr
isLoad = 1'b0;
isStore = 1'b0;
aluQual = 1'b0;
needWaitALU = 1'b0;
(* parallel_case, full_case *)
casez(instr[6:2])
5'b011?1: begin // LUI
writeBackEn = 1'b1; // enable write back
writeBackSel = 4'b0001; // write back source = ALU
inRegId1Sel = 1'b0; // reg 1 Id = 0
aluInSel1 = 1'b0; // ALU source 1 = reg
aluInSel2 = 1'b1; // ALU source 2 = imm
aluSel = 1'b0; // ALU op = ADD
imm = Uimm; // imm format = U
end
5'b001?1: begin // AUIPC
writeBackEn = 1'b1; // enable write back
writeBackSel = 4'b0001; // write back source = ALU
inRegId1Sel = 1'bx; // reg 1 Id : don't care (we use PC)
aluInSel1 = 1'b1; // ALU source 1 = PC
aluInSel2 = 1'b1; // ALU source 2 = imm
aluSel = 1'b0; // ALU op = ADD
imm = Uimm; // imm format = U
end
5'b11011: begin // JAL
writeBackEn = 1'b1; // enable write back
writeBackSel = 4'b0010; // write back source = PC+4
inRegId1Sel = 1'bx; // reg 1 Id : don't care (we use PC)
aluInSel1 = 1'b1; // ALU source 1 = PC
aluInSel2 = 1'b1; // ALU source 2 = imm
aluSel = 1'b0; // ALU op = ADD
nextPCSel = 3'b010; // PC <- ALU
imm = Jimm; // imm format = J
end
5'b11001: begin // JALR
writeBackEn = 1'b1; // enable write back
writeBackSel = 4'b0010; // write back source = PC+4
aluInSel1 = 1'b0; // ALU source 1 = reg
aluInSel2 = 1'b1; // ALU source 2 = imm
aluSel = 1'b0; // ALU op = ADD
nextPCSel = 3'b010; // PC <- ALU
imm = Iimm; // imm format = I
end
5'b110?0: begin // Branch
writeBackEn = 1'b0; // disable write back
writeBackSel = 4'bxxxx; // write back source = don't care
aluInSel1 = 1'b1; // ALU source 1 : PC
aluInSel2 = 1'b1; // ALU source 2 : imm
aluSel = 1'b0; // ALU op = ADD
nextPCSel = 3'b100; // PC <- pred ? ALU : PC+4
imm = Bimm; // imm format = B
end
5'b001?0: begin // ALU operation: Register,Immediate
writeBackEn = 1'b1; // enable write back
writeBackSel = 4'b0001; // write back source = ALU
aluInSel1 = 1'b0; // ALU source 1 : reg
aluInSel2 = 1'b1; // ALU source 2 : imm
// Qualifier for ALU op: SRLI/SRAI
aluQual = aluOpIsShift ? instr[30] : 1'b0;
needWaitALU = aluOpIsShift;
aluSel = 1'b1; // ALU op : from instr
imm = Iimm; // imm format = I
end
5'b011?0: begin // ALU operation: Register,Register
writeBackEn = 1'b1; // enable write back
writeBackSel = 4'b0001; // write back source = ALU
aluInSel1 = 1'b0; // ALU source 1 : reg
aluInSel2 = 1'b0; // ALU source 2 : reg
aluQual = instr[30]; // Qualifier for ALU op: +/- SRL/SRA
aluSel = 1'b1; // ALU op : from instr
needWaitALU = aluOpIsShift;
imm = 32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx; // don't care
end
5'b000?0: begin // Load
writeBackEn = 1'b1; // enable write back
writeBackSel = 4'b0100; // write back source = RAM
aluInSel1 = 1'b0; // ALU source 1 = reg
aluInSel2 = 1'b1; // ALU source 2 = imm
aluSel = 1'b0; // ALU op = ADD
imm = Iimm; // imm format = I
isLoad = 1'b1;
end
5'b010?0: begin // Store
writeBackEn = 1'b0; // disable write back
writeBackSel = 4'bxxxx; // write back sel = don't care
aluInSel1 = 1'b0; // ALU source 1 = reg
aluInSel2 = 1'b1; // ALU source 2 = imm
aluSel = 1'b0; // ALU op = ADD
imm = Simm; // imm format = S
isStore = 1'b1;
end
default: begin
writeBackEn = 1'b0;
writeBackSel = 4'bxxxx;
inRegId1Sel = 1'bx;
aluInSel1 = 1'bx;
aluInSel2 = 1'bx;
aluSel = 1'bx;
nextPCSel = 3'bxxx;
imm = 32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx;
end
endcase
end
endmodule
I can build the Verilog part easily, but today I tried to build the Firmware for the first time, and have a few observations:
Ubuntu (and Debian, too) already contains a RISC-V toolchain, why trying to download another one from SiFive ?
On the Icestick, hooking the Reset input to pin 3 for default would be useful, as the DTR line can be toggled from the terminal.
To find out what might go wrong, I would like if you provide a pre-compiled firmware image (I opt for mandelbrot-terminal) and a pre-synthesised bitstream. Then, re-building could be done in separate steps, first check flashing, then check rebuilding verilog, then check rebulding firmware.
The Makefiles for the firmware are quite compilicated with all its dependencies, I recommend having a simple single-file assembler blinky, without using any dependencies.
My take faults (with decoder.v) immediately, however:
.section .text
#################################################################################
# Mapped IO constants
.equ IO_BASE, 0x400000 # Base address of memory-mapped IO
.equ IO_LEDS, 4 # 4 LSBs mapped to D1,D2,D3,D4
.equ IO_OLED_CNTL, 8 # OLED display control.
# wr: 01: reset low 11: reset high 00: normal operation
# rd: 0: ready 1: busy
.equ IO_OLED_CMD, 16 # OLED display command. Only 8 LSBs used.
.equ IO_OLED_DATA, 32 # OLED display data. Only 8 LSBs used.
.equ IO_UART_CNTL, 64 # USB UART control. busy (bit 9), data ready (bit 8)
.equ IO_UART_DATA, 128 # USB UART data (read/write)
.equ IO_LEDMTX_CNTL, 256 # LED matrix control. read: LSB bit 1 if busy
.equ IO_LEDMTX_DATA, 512 # LED matrix data (write)
################################################################################
li x1, IO_BASE
li x2, 0
1: sw x2, IO_LEDS(x1)
addi x2, x2, 1
li x3, 0xFFFFF
2: addi x3, x3, -1
bne x3, zero, 2b
j 1b
Memmap:
MEMORY
{
rom(RX) : ORIGIN = 0x00000000, LENGTH = 0x400
}
SECTIONS
{
.text : { *(.text*) } > rom
}
Commands to assemble:
riscv64-linux-gnu-as blinky.s -o blinky.o -march=rv32i
riscv64-linux-gnu-ld -o blinky.elf -T memmap blinky.o -m elf32lriscv
riscv64-linux-gnu-objdump -Mnumeric -D blinky.elf > blinky.list
riscv64-linux-gnu-objcopy blinky.elf blinky.hex -O verilog
Wow ! Nice progress ! On the peripheral side, I recommend adding a simple GPIO port with IN, OUT and DIR registers to the mix for the tutorial.
I am looking forward to that ! Yes, your new modular design is much more understandable, a very good implementation for experiments and teaching the fundamentals. Good luck for the next steps, and: Joyeux Noël !
Hi Matthias, Joyeux Noël to you too ! Mapped memory interface for the SPI flash is functional, execute from SPI will come next (need to insert a couple of 'wait for SPI' states in the FSM).
Merry Christmas and Happy new year. I am starting to take notes on running the femtorv32 on the ice40-feather. Will make a tutorial PR when ready. Adopting the feather eco-system gives access to lots of tried and tested peripherals and has many more users at hobby level than the PMOD ecosystem. Hopefully I can get somewhere useful.
Hi Matthias, Run from SPI flash seems to work ! To test it:
1) edit RTL/femtosoc_config.v
uncomment the following lines
define NRV_MAPPED_SPI_FLASH (but
NRV_IO_SPI_FLASH should be commented)
define NRV_RUN_FROM_SPI
define NRV_MINIRV32 (for now, run from SPI is only implemented for the new mini-femtorv32 core, that has a simpler FSM)
2) the SPI flash starting at address 1M is mapped at address 0x80000, so to test it:
write some ASM code, compile it to raw binary, send it to the SPI flash with iceprog -o 1M xxx.bin
cd FIRMWARE/ASM_EXAMPLES make blinker_shift_fast.bm_elf ../TOOLS/firmware_words blinker_shift_fast.bm_elf -ram 8192 -bin blinker_shift_fast.bin iceprog -o 1M blinker_shift_fast.bin
3) compute a firmware that jumps to the mapped SPI flash
4) let's rock'n'roll !
Notes: The way I'm generating the .bin file is not correct ! (crt0.S is copied one more time, and the linker does not know it is going to go at address 0x80000). It is OK because with a blinky, the code is relocable, but for compiling the Forth interpreter, we will need a correct linker script, that puts the code at address 0x80000, and that lets the rest in the RAM starting at address 0 (but maybe you already have something like that for J1).
Oops, wait a minute, seems I made some mistakes (it is 0x800000, not 0x80000), but I'm jumping to 0x80000 and it blinks, not normal, it should not ! Need to understand what's going in... Will come back shortly with more news.
Works also with 0x800000, I pushed the files, so that you can test if you want (now I need to understand why it worked also with 0x80000, maybe my 1-hot address encoding makes it possible, need to understand).
Hi Bruno,
thank you for the large effort to get this up and running !
Now on to try your achievement:
I put a "blank" memory image into firmware.hex and activated both
`define NRV_MINIRV32
`define NRV_MAPPED_SPI_FLASH
Synthesis is fine by using
make ICESTICK.synth
But now a few questions:
How is the memory map ?
Given
wire mem_address_is_ram = (mem_address[23:22] == 2'b00);
wire mem_address_is_io = (mem_address[23:22] == 2'b01);
wire mem_address_is_spi_flash = (mem_address[23:22] == 2'b10);
I think it is this way, correct ?
0x00000000 to 0x000017FF Block RAM, 6 kb 0x00800000 to 0x00BFFFFF Mapped SPI memory
With this, do I get the bitstream starting at 0x00800000, or is there an offset, shifting the bitstream out of "mapped view" ?
How do I configure the Reset address of FemtoRV to start executing from 0x00800000 or 0x00800000 + Offset-to-the-end-of-the-bitstream ?
I assume changing this piece will do the trick:
always @(posedge clk) begin
if(!reset) begin
state <= INITIAL;
addressReg <= 0;
PC <= 0;
end else
By the way, I think no precompiled firmware.hex should be necessary when using the mapped SPI memory feature.
Matthias
Hi Matthias,
Yes, initializing addressReg <= 0x800000 and PC <= 0x800000 in the reset bloc will directly jump to the SPI Flash, without needing any BRAM initialization nor firmware.hex. I will add a configuration macro for that.
The SPI Flash is mapped as follows in memory: If you send a file (for instance, hello.txt that contains "hello, world\n\0") to the SPI Flash, using iceprog -o 1M hello.txt, then the data is mapped at 0x800000 (and printf((char*)0x800000); will say hello !)
You will need to initialize the stack pointer at the end of the RAM, take a look at CRT_BAREMETAL/crt0.S The total quantity of RAM can be queried at address IO_BASE + IO_RAM (or you can also hardwire 6K) ... or I can also add an option to do that automatically
Best, -- B
Hooray ! It works ! Blinky in assembler written to -o 1M is up and running !
PS: When using
`define NRV_RESET_ADDR 0x800000
it gives error
RTL/PROCESSOR/mini_femtorv32.v:290: ERROR: syntax error, unexpected TOK_ID
I changed this to
`define NRV_RESET_ADDR 32'h00800000
and it synthesises nicely.
The total quantity of RAM can be queried at address IO_BASE + IO_RAM (or you can also hardwire 6K)
Hardwired. Better save the LUTs for a GPIO port.
Great ! Very happy it starts working, looking forward to see Forth running on it ! I pushed a new version with:
I am not sure on how to use the busy flag of the UART. When adding a delay, it transmits correctly, but without the delay, this code transmits garbage in the terminal.
.section .text
#################################################################################
# Mapped IO constants
.equ IO_BASE, 0x400000 # Base address of memory-mapped IO
.equ IO_LEDS, 4 # 4 LSBs mapped to D1,D2,D3,D4
.equ IO_OLED_CNTL, 8 # OLED display control.
# wr: 01: reset low 11: reset high 00: normal operation
# rd: 0: ready 1: busy
.equ IO_OLED_CMD, 16 # OLED display command. Only 8 LSBs used.
.equ IO_OLED_DATA, 32 # OLED display data. Only 8 LSBs used.
.equ IO_UART_CNTL, 64 # USB UART control. busy (bit 9), data ready (bit 8)
.equ IO_UART_DATA, 128 # USB UART data (read/write)
.equ IO_LEDMTX_CNTL, 256 # LED matrix control. read: LSB bit 1 if busy
.equ IO_LEDMTX_DATA, 512 # LED matrix data (write)
################################################################################
# x1: Link register
li x2, 0x1800 # x2: Stack pointer, at the end of 6 kb
li x3, IO_BASE
li x4, 0
1: sw x4, IO_LEDS(x3)
# Wait for busy flag being cleared
2: lw x5, IO_UART_CNTL(x3)
andi x5, x5, 0x200 # Bit 9: Busy
bne x5, zero, 2b
sw x4, IO_UART_DATA(x3)
# Small delay
li x5, 0x4000
3: addi x5, x5, -1
bne x5, zero, 3b
# Next character
addi x4, x4, 1
j 1b
Valid flag does not work as expected, too:
.section .text
#################################################################################
# Mapped IO constants
.equ IO_BASE, 0x400000 # Base address of memory-mapped IO
.equ IO_LEDS, 4 # 4 LSBs mapped to D1,D2,D3,D4
.equ IO_OLED_CNTL, 8 # OLED display control.
# wr: 01: reset low 11: reset high 00: normal operation
# rd: 0: ready 1: busy
.equ IO_OLED_CMD, 16 # OLED display command. Only 8 LSBs used.
.equ IO_OLED_DATA, 32 # OLED display data. Only 8 LSBs used.
.equ IO_UART_CNTL, 64 # USB UART control. busy (bit 9), data ready (bit 8)
.equ IO_UART_DATA, 128 # USB UART data (read/write)
.equ IO_LEDMTX_CNTL, 256 # LED matrix control. read: LSB bit 1 if busy
.equ IO_LEDMTX_DATA, 512 # LED matrix data (write)
################################################################################
# x1: Link register
li x2, 0x1800 # x2: Stack pointer, at the end of 6 kb
li x3, IO_BASE
li x4, 42 # Emit a * on first loop run
1: sw x4, IO_LEDS(x3)
sw x4, IO_UART_DATA(x3)
2: # Wait for valid flag being set
lw x5, IO_UART_CNTL(x3)
andi x5, x5, 0x100 # Bit 8: Valid
addi x6, x6, 1 # Spin LEDs as indicator
sw x6, IO_LEDS(x3)
beq x5, zero, 2b
lw x4, IO_UART_DATA(x3)
addi x4, x4, 1 # Echo back a different character
j 1b
By the way, I use the PicoSoC-UART by Claire Wolf in Mecrisp-Ice, which is smaller than the one of James Bowman, at least for me:
/*
* PicoSoC - A simple example SoC using PicoRV32
*
* Copyright (C) 2017 Clifford Wolf <clifford@clifford.at>
*
* Permission to use, copy, modify, and/or distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*
*/
// October 2019, Matthias Koch: Renamed wires
module buart (
input clk,
input resetq,
output tx,
input rx,
input wr,
input rd,
input [7:0] tx_data,
output [7:0] rx_data,
output busy,
output valid
);
reg [3:0] recv_state;
reg [$clog2(`cfg_divider)-1:0] recv_divcnt; // Counts to cfg_divider. Reserve enough bytes !
reg [7:0] recv_pattern;
reg [7:0] recv_buf_data;
reg recv_buf_valid;
reg [9:0] send_pattern;
reg [3:0] send_bitcnt;
reg [$clog2(`cfg_divider)-1:0] send_divcnt; // Counts to cfg_divider. Reserve enough bytes !
reg send_dummy;
assign rx_data = recv_buf_data;
assign valid = recv_buf_valid;
assign busy = (send_bitcnt || send_dummy);
always @(posedge clk) begin
if (!resetq) begin
recv_state <= 0;
recv_divcnt <= 0;
recv_pattern <= 0;
recv_buf_data <= 0;
recv_buf_valid <= 0;
end else begin
recv_divcnt <= recv_divcnt + 1;
if (rd) recv_buf_valid <= 0;
case (recv_state)
0: begin
if (!rx)
recv_state <= 1;
recv_divcnt <= 0;
end
1: begin
if (2*recv_divcnt > `cfg_divider) begin
recv_state <= 2;
recv_divcnt <= 0;
end
end
10: begin
if (recv_divcnt > `cfg_divider) begin
recv_buf_data <= recv_pattern;
recv_buf_valid <= 1;
recv_state <= 0;
end
end
default: begin
if (recv_divcnt > `cfg_divider) begin
recv_pattern <= {rx, recv_pattern[7:1]};
recv_state <= recv_state + 1;
recv_divcnt <= 0;
end
end
endcase
end
end
assign tx = send_pattern[0];
always @(posedge clk) begin
send_divcnt <= send_divcnt + 1;
if (!resetq) begin
send_pattern <= ~0;
send_bitcnt <= 0;
send_divcnt <= 0;
send_dummy <= 1;
end else begin
if (send_dummy && !send_bitcnt) begin
send_pattern <= ~0;
send_bitcnt <= 15;
send_divcnt <= 0;
send_dummy <= 0;
end else
if (wr && !send_bitcnt) begin
send_pattern <= {1'b1, tx_data[7:0], 1'b0};
send_bitcnt <= 10;
send_divcnt <= 0;
end else
if (send_divcnt > `cfg_divider && send_bitcnt) begin
send_pattern <= {1'b1, send_pattern[9:1]};
send_bitcnt <= send_bitcnt - 1;
send_divcnt <= 0;
end
end
end
endmodule
i changed the wire names to be a drop-in replacement.
Oh, I just see it: You changed the UART interface. But how do I read the valid/busy flags without actually fetching the character ? In Forth, there are traditionally four routines for terminal: EMIT? EMIT KEY? KEY and its important to be able to check the flags without actually transmitting/receiving something. Maybe the other UART will give you enough LUTs to re-insert the UART flag register.
You could use different write strobes for that. Address strobe +0 should have fetch/transmit side effects, address strobe +1 for the flags should not. Then both behaviours are available for the software side depending on using lb/sb and lh/sh.
Hi Matthias,
If you want to save some more LUTs you can always do like I did in the SERVant SoC and bitbang the UART with a single GPIO instead and drive it like this https://github.com/olofk/serv/blob/master/zephyr/drivers/serial/uart_bitbang.c#L21 with the correct amount of NOPs for your CPU speed :)
@olofk Hey, thanks, nice to read you here ! We know SERV and I am very impressed with it, but we'll just try a drop-in exchange with a more traditional UART :-)
Hi @olofk, very happy to hear from you, and thanks a lot for the pointer to your UART bitbanging code ! For now what I'm trying to do is to balance speed/LUT count/number of Verilog lines/legibility (the goal is to transform the material into a course). Clearly bitbanging can be sometimes a good option ! (I'm doing that to talk to the SDCard), it will depend on how many LUTs remain in the end !
A proper UART is definitely the better choice unless the main goal is to minimize resource usage. Wasn't sure how much extra space you had on the small iCE40 devices.
@Mecrisp, how do you configure `cfg_divider, is it simply (clock freq / bauds) or is it something more subtle ?
Exactly that. Nothing special.
Claire's UART inferfaced.
... also trying some LUT-golfing in Claire's code. -> 1225 LUTs so far... (not stellar, trying other things...)
Try another baudrate. You may get surprising results. My idea was that a faster baudrate results in a smaller divider and hence in less logic for counter and comparison, but I just tried it on Mecrisp-Ice 1.8c for HX1K and got:
1273 LUTs with a divider of
`define cfg_divider 208 // 48 MHz / 230400
and 1227 LUTs with
`define cfg_divider 416 // 48 MHz / 115200
Matthias
It is very difficult to forecast which configuration will give what ! Well for now I'm stuck around 1220 LUTs, I have pushed the new version. It is still possible to use the UART from J1 (there is a toggle in RTL/DEVICES/uart.v)
There is also something I do not understand, to generate the "half baud" clock for receive, there is this test: wire recv_half_baudclk = recv_divcnt > divider/2; Normally, it is possible (and less costly in terms of LUTs) to replace it with this one, since recv_divcnt is reset to zero at the next cycle: wire recv_half_baudclk = (recv_divcnt == divider/2 + 1);
But when I do that, things become unstable (sometime I receive random characters).
Many mysteries to be investigated ! -- Bruno P.S. Now working on reviving the control register.
Pushed new version with control register. Current LUT golfing par: UART, LEDS, 90 MHz, MINIRV32 => 1084 LUTs UART, LEDS, Mapped SPI, 90 MHz, MINIRV32 => 1232 LUTs
-- B
Merged the baud generator for send and receive, gained 30 LUTs or so... ... now trying to merge bitcount (LUT golfing is so addictive...)
... now trying to merge bitcount (LUT golfing is so addictive...)
Ain't going to argue with that :)
LUT golding tip for n-bit counters that count to k: Make it an n+1-bit downcounter, load it with k-1 and check when msb is set (wraparound). Costs an extra adder bit but saves a n-bit comparison
I confirm UART flags are working. Now on to port Mecrisp-Quintus !
.section .text
#################################################################################
# Mapped IO constants
.equ IO_BASE, 0x400000 # Base address of memory-mapped IO
.equ IO_LEDS, 4 # 4 LSBs mapped to D1,D2,D3,D4
.equ IO_OLED_CNTL, 8 # OLED display control.
# wr: 01: reset low 11: reset high 00: normal operation
# rd: 0: ready 1: busy
.equ IO_OLED_CMD, 16 # OLED display command. Only 8 LSBs used.
.equ IO_OLED_DATA, 32 # OLED display data. Only 8 LSBs used.
.equ IO_DEVICES_FREQ, 64 # HW config: devices and frequency
.equ IO_UART_CNTL, 8192 # USB UART data (read/write)
.equ IO_UART_DATA, 128 # USB UART data (read/write)
.equ IO_RAM, 256 # HW config: Installed amount of RAM
.equ IO_LEDMTX_DATA, 512 # LED matrix data (write)
################################################################################
# x1: Link register
li x2, 0x1800 # x2: Stack pointer, at the end of 6 kb
li x3, IO_BASE
li x9, IO_BASE + IO_UART_CNTL
li x4, 0
1: # sw x4, IO_LEDS(x3)
# Wait for busy flag being cleared
2: lw x5, 0(x9)
srli x10, x5, 8
sw x10, IO_LEDS(x3)
andi x5, x5, 0x200 # Bit 9: Busy
bne x5, zero, 2b
sw x4, IO_UART_DATA(x3)
# Small delay
li x5, 0x4
3: addi x5, x5, -1
bne x5, zero, 3b
# Next character
addi x4, x4, 1
j 1b
Oh, wait: This one does not work !
.section .text
#################################################################################
# Mapped IO constants
.equ IO_BASE, 0x400000 # Base address of memory-mapped IO
.equ IO_LEDS, 4 # 4 LSBs mapped to D1,D2,D3,D4
.equ IO_OLED_CNTL, 8 # OLED display control.
# wr: 01: reset low 11: reset high 00: normal operation
# rd: 0: ready 1: busy
.equ IO_OLED_CMD, 16 # OLED display command. Only 8 LSBs used.
.equ IO_OLED_DATA, 32 # OLED display data. Only 8 LSBs used.
.equ IO_DEVICES_FREQ, 64 # HW config: devices and frequency
.equ IO_UART_CNTL, 8192 # USB UART data (read/write)
.equ IO_UART_DATA, 128 # USB UART data (read/write)
.equ IO_RAM, 256 # HW config: Installed amount of RAM
.equ IO_LEDMTX_DATA, 512 # LED matrix data (write)
################################################################################
# x1: Link register
li x2, 0x1800 # x2: Stack pointer, at the end of 6 kb
li x3, IO_BASE
li x9, IO_BASE + IO_UART_CNTL
li x4, 0
1: # sw x4, IO_LEDS(x3)
# Wait for busy flag being cleared
2: lw x5, 0(x9)
srli x10, x5, 8
sw x10, IO_LEDS(x3)
andi x5, x5, 0x200 # Bit 9: Busy
bne x5, zero, 2b
sw x4, IO_UART_DATA(x3)
# Next character
addi x4, x4, 1
j 1b
When commenting out the instruction "bne x5, zero, 3b" the program stops working. Maybe a fault in CPU/fetch logic ?
.section .text
#################################################################################
# Mapped IO constants
.equ IO_BASE, 0x400000 # Base address of memory-mapped IO
.equ IO_LEDS, 4 # 4 LSBs mapped to D1,D2,D3,D4
.equ IO_OLED_CNTL, 8 # OLED display control.
# wr: 01: reset low 11: reset high 00: normal operation
# rd: 0: ready 1: busy
.equ IO_OLED_CMD, 16 # OLED display command. Only 8 LSBs used.
.equ IO_OLED_DATA, 32 # OLED display data. Only 8 LSBs used.
.equ IO_DEVICES_FREQ, 64 # HW config: devices and frequency
.equ IO_UART_CNTL, 8192 # USB UART data (read/write)
.equ IO_UART_DATA, 128 # USB UART data (read/write)
.equ IO_RAM, 256 # HW config: Installed amount of RAM
.equ IO_LEDMTX_DATA, 512 # LED matrix data (write)
################################################################################
# x1: Link register
li x2, 0x1800 # x2: Stack pointer, at the end of 6 kb
li x3, IO_BASE
li x9, IO_BASE + IO_UART_CNTL
li x4, 0
1: # sw x4, IO_LEDS(x3)
# Wait for busy flag being cleared
2: lw x5, 0(x9)
srli x10, x5, 8
sw x10, IO_LEDS(x3)
andi x5, x5, 0x200 # Bit 9: Busy
bne x5, zero, 2b
sw x4, IO_UART_DATA(x3)
# Small delay
li x5, 0x1
3: addi x5, x5, -1
bne x5, zero, 3b # When commenting out this jump, the program stops working.
# Next character
addi x4, x4, 1
j 1b
How I make it:
memmap:
MEMORY
{
rom(RX) : ORIGIN = 0x00800000, LENGTH = 0x400
}
SECTIONS
{
.text : { *(.text*) } > rom
}
Assemble:
riscv64-linux-gnu-as blinky.s -o blinky.o -march=rv32i
riscv64-linux-gnu-ld -o blinky.elf -T memmap blinky.o -m elf32lriscv
riscv64-linux-gnu-objdump -Mnumeric -D blinky.elf > blinky.list
riscv64-linux-gnu-objcopy blinky.elf blinky.bin -O binary
Thank you very much for the update, Yes, I haven't tested yet exec from SPI very much, so there is probably a couple of bugs that remain (fixed a big one yesterday) I will probably need to write a simulator for the flash spi to be able to see what's going on. -- B
@olofk thank you very much for this trick, I love it !
@olofk Wonderful, this saved me an additional 20 LUTs !
Hi Bruno,
I wish you an enjoyable new year !
The bug with the blinky code above is fixed in your latest commit, and I started porting Forth to FemtoRV. To see what happens, I inserted LED patterns at some locations, and it seems as if it hangs forever in this loop which is designed to skip a routine, searching for the ret opcode at the end:
li x14, 0x00008067 # Ret-Opcode
1:lw x15, 0(x8)
addi x8, x8, 4
bne x15, x14, 1b
I am, however, not sure if the loop fails, or if the initial value in x8 is already wrong at the beginning. As the codebase of Mecrisp-Quintus is running on other RISC-V processors nicely, I suspect there is still a bug in execution from Flash memory.
Are you able to compile your C demos to run directly from SPI flash, are there known issues yet ?
Matthias
Hi Matthias, After a big pass of reorg of the IO-space and HDMI for the ULX3S, I will work again on exec-from-spi and keep you updated. Best wishes (and happy new year :-) -- B
Hi Bruno,
something you might find interesting: Here is a SDRAM controller specifically made for the ULX3S and with a bus interface designed for PicoRV32.
https://github.com/rxrbln/picorv32/blob/master/picosoc/sdram.v
Some files are missing in the repository, the project cannot be synthesised as-is and the bugtracker is deactivated, but I already contacted the author via E-Mail.
Have fun with clocks and gates :-) Matthias
Reply from author:
Hi,
done:
https://www.youtube.com/watch?v=YoILfUAmwjU https://www.youtube.com/watch?v=YoILfUAmwjU
https://github.com/rxrbln/picorv32 https://github.com/rxrbln/picorv32
Mit freundlichen Grüßen, René Rebe
Hi Matthias, I'm now working on exec from SPI. I have tried a couple of simple programs, it seems to work (but it does not prove that there is no bug !) Different things that need care:
Then one problems remain: sdata segments (initialized RW), do you have some in mecrip ?
P.S. since I do not use fast SPI modes, it is super slow (maybe 32 times slower than exec from BRAM). We'll probably need a small instr cache...
Something else that I noted: starting execution from NRV_RESET_ADDR does not always work (I probably still have a bug in the processor), so for now I'm using FIRMWARE/ASM_EXAMPLES/jump_to_spi_flash.S
Dear Bruno,
my congratulations for squeezing a RV32I core into the Icestick !
I read your Verilog files with joy and I wish to share an idea on how to save a few more LUTs for more peripherals: Try an "one-hot" IO address decoder. You have few IO registers only, so you can reserve one address line for each of your peripheral registers and save LUTs on comparisons with the full IO address. This also allows to set multiple IO registers at once.
You can also insert a hardware random number generator by using a ring oscillator.
Maybe you wish to check out Mecrisp-Ice from mecrisp.sourceforge.net in file mecrisp-ice-1.8/hx1k/icestorm/j1a.v for my peripheral set in use on the Icestick. Mecrisp-Ice is a Forth compiler running on a stack processor, which is a descendant of Swapforth and the J1a CPU by James Bowman. I think you can borrow a few of the ideas !
If you manage to map the SPI flash into the memory bus within the available LUTs, similiar to the memory interface in Picosoc, I would be happy to officially port Mecrisp-Quintus (a RISC-V Forth which needs about 24kb flash and 4 kb RAM) to your FemtoRV32 on the Icestick.
Hats off and best wishes from Germany, Matthias
PS: Completely removing the rdRAM wire in your memory design somehow saved 20 LUTs.