Linux on VexRiscv - Githubissues

ghost commented 5 years ago

My intention with creating this issue is collecting/sharing information and gauging interest about running Linux on VecRiscv. From what I know, VexRiscv is still missing functionality, and it won't work out of the box.

A big problem is the MMU. Ideally, "someone" will hopefully write patches to add no-MMU support to Linux/RISC-V, but currently, a MMU is required. It appears VexRiscv has a partial MMU implementation using a software-filled TLB. There needs to be machine mode to walk the page tables and fill the TLBs, and I didn't find a reference implementation of that.

Another issue are atomics. Linux requires them currently. There seems to be partial support present in VexRiscv (a subset or so). Another possibility is patching the kernel not to use atomics if built without SMP support. There's also the question how much atomics support userspace typically requires.

Without doubt there are more issues that I don't know about.

Antmicro apparently made a Linux port: https://github.com/antmicro/litex-rv32-linux-system https://github.com/antmicro/litex-linux-riscv I didn't know about this before and haven't managed to build the whole thing yet. Unfortunately, their Linux kernel repository does not include the git history. Here's a diff against the apparent base: https://0x0.st/z-li.diff

Please post any other information you know.

Dolu1990 commented 5 years ago

About atomics, there is some support in VexRiscv to provide LR/SC in a local way, it only work for single CPU systems.

ghost commented 5 years ago

Yeah, "dummy" implementations that work on single CPU systems should be perfectly fine.

enjoy-digital commented 5 years ago

As discussed at Free Silicon Conference together with @Dolu1990 , we are also working on it here: https://github.com/enjoy-digital/litex/issues/134.

We can continue the discussion here for the CPU aspect. @daveshah1: i saw you made some progress, just for info @Dolu1990 is ok to help getting things working. So it you see strange things or need help on things related to Spinal/Vexriscv, you can discuss your findings here.

daveshah1 commented 5 years ago

My current status is that I have made quite a few hacks to the kernel, vexriscv and LiteX, but I'm still only just getting into userspace and not anywhere useful yet.

VexRiscv: https://github.com/daveshah1/VexRiscv/tree/Supervisor Build config: https://github.com/daveshah1/VexRiscv-verilog/tree/linux LiteX: https://github.com/daveshah1/litex/tree/vexriscv-linux kernel: https://github.com/daveshah1/litex-linux-riscv

@Dolu1990 I would be interested if you could look at 818f1f68686c75d7ee056d0da1843b98ade4b622 - loads were always reading 0xffffffff from virtual memory addresses when bit 10 of the offset (0x400) was set. This seems to fix it, but I'm not sure if a better fix is possible

As it stands, the current issue is a kernel panic "Oops - environment call from S-mode" shortly after init starts. It seems after a few syscalls it either isn't returning properly to userspace, or a spurious ECALL is accidently triggered while in S-mode (it might be the ECALL getting "stuck" somewhere and lurking, so what should be an IRQ triggers the ECALL instead)

Dolu1990 commented 5 years ago

Hi @daveshah1 @enjoy-digital :D

So, for sure we will hit bugs in VexRiscv, as only the machine mode was properly tested. Things not tested enough in VexRiscv which could have bugs :

Supervisor / User mode
MMU

I think the best would be to setup a minimal test environnement to run linux on. It would save us a lot of time and sanity. Especialy for a linux port project :D So, to distinguish hardware bugs from software bugs my proposal is that i setup a minimalistic environnement where only the VexRiscv CPU is simulated and compared against a instruction syncronised software model of the CPU (I already have one which do that, but CSR are missing from it) This would point exactly when the hardware is diverging from what it should do, and bring serenity in the developpement ^.^

Does that sound good for you ?

daveshah1 commented 5 years ago

That sounds very sensible! The minimal peripheral requirement is low, just a timer (right now I have the LiteX timer connected to the timerInterruptS pin, and hacked the kernel to directly talk to that rather than the proper SBI route to setting up a timer) and a UART of some kind.

My only concern with this is speed, right now it is taking about 30s on hardware at 75MHz to get to the point of failure. So definitely want to use Verilator and not iverilog...

enjoy-digital commented 5 years ago

I can setup easily a verilator simulation. But 30s on hardware at 75MHz will still be a bit slow: we can expect 1MHz execution speed so that's still around 40 min...

daveshah1 commented 5 years ago

I did just manage to make a bit of progress on hardware (perhaps this talk of simulators is scaring it into behaviour :smile:)

It does reach userspace successfully, so we can almost say Linux is working. If I set /bin/sh as init, then I can even use shell builtins - being able to run echo hello world counts as Linux, right? (but calls to other programs don't seem to work). init itself is segfaulting deep within libc, so there's still something fishy, but could just be a dodgy rootfs.

kgugala commented 5 years ago

@daveshah1 this is great. The libc segfault happened also in our REnode (https://github.com/renode/renode) emulation. Can you share the rootfs you're using?

daveshah1 commented 5 years ago

initramdisk.gz

This is the initramdisk from antmicro/litex-linux-readme with a small change to inittab to remove some references to files that don't exist

In terms of other outstanding issues, I also had to patch VexRiscv so that interrupts are routed to S-mode rather than M-mode. This broke the LiteX BIOS which expects M-mode interrupts, so I had to patch that to not expect interrupts at all, but that means there is now no useful UART output from the BIOS. I think a proper solution would be to select interrupt privilege dynamically somehow.

kgugala commented 5 years ago

We had to fix/workaround irq delegates. I think this code should be in our repo, but I'll check that again.

daveshah1 commented 5 years ago

The segfault I see is:

[   53.060000] getty[45]: unhandled signal 11 code 0x1 at 0x00000004 in libc-2.26.so[5016f000+148000]
[   53.070000] CPU: 0 PID: 45 Comm: getty Not tainted 4.19.0-rc4-gb367bd23-dirty #105
[   53.080000] sepc: 501e2730 ra : 501e2e1c sp : 9f9b2c60
[   53.080000]  gp : 00120800 tp : 500223a0 t0 : 5001e960
[   53.090000]  t1 : 00000000 t2 : ffffffff s0 : 00000000
[   53.090000]  s1 : 00000000 a0 : 00000000 a1 : 502ba624
[   53.100000]  a2 : 00000000 a3 : 00000000 a4 : 000003ef
[   53.100000]  a5 : 00000160 a6 : 00000000 a7 : 0000270f
[   53.110000]  s2 : 502ba5f4 s3 : 00000000 s4 : 00000150
[   53.110000]  s5 : 00000014 s6 : 502ba628 s7 : 502bb714
[   53.120000]  s8 : 00000020 s9 : 00000000 s10: 000003ef
[   53.120000]  s11: 00000000 t3 : 00000008 t4 : 00000000
[   53.130000]  t5 : 00000000 t6 : 502ba090
[   53.130000] sstatus: 00000020 sbadaddr: 00000004 scause: 0000000d

The bad address (0x73730 in libc-2.26.so) seems to be in _IO_str_seekoff, the disassembly around it is:

   73700:   00080c93            mv  s9,a6
   73704:   00048a13            mv  s4,s1
   73708:   000e0c13            mv  s8,t3
   7370c:   000d8993            mv  s3,s11
   73710:   010a0793            addi    a5,s4,16
   73714:   00000d93            li  s11,0
   73718:   00000e93            li  t4,0
   7371c:   00800e13            li  t3,8
   73720:   3ef00d13            li  s10,1007
   73724:   02f12223            sw  a5,36(sp)
   73728:   04092483            lw  s1,64(s2)
   7372c:   71648463            beq s1,s6,73e34 <_IO_str_seekoff@@GLIBC_2.26+0x41bc>
   73730:   0044a783            lw  a5,4(s1)

kgugala commented 5 years ago

I checked the code, and it looks like all has been pushed to github.

As for the segfault: Note that we had to re implement the mapping code in Linux + there are some hacks in the Vex MMU itself. This could be reason of the segfault as user space starts using the virtual memory very extensively.

For example the whole kernel memory space is mapped directly and we bypass the MMU translation maps see: https://github.com/antmicro/VexRiscv/blob/97d04a5243bbfee9d1dfe56857f3490da9fe1091/src/main/scala/vexriscv/plugin/MemoryTranslatorPlugin.scala#L116

the kernel range is defined in MMU plugin instance: https://github.com/antmicro/VexRiscv/blob/97d04a5243bbfee9d1dfe56857f3490da9fe1091/src/main/scala/vexriscv/TestsWorkspace.scala#L98

I'm pretty sure there are many bugs hidden there :)

Dolu1990 commented 5 years ago

Ok, I will think about the best way and how exactly setup that test environnement with the syncronised software golden model (to get max speed). About the golden model, i will complet it (MMU part). But then about the CSR i can do it too, but probably the best would be that somebody else than me cross check my interpretation of the privileged spec, because if both the hardware and the software golden model implement the same wrong interpretation, that's not so helpfull ^^.

Dolu1990 commented 5 years ago

@enjoy-digital Maybe we can keep the actual regression test environnement of VexRiscv, and just complet it with the required stuff. It's a bit dirty, but it should be fine. https://github.com/SpinalHDL/VexRiscv/blob/master/src/test/cpp/regression/main.cpp

The golden model is currently there https://github.com/SpinalHDL/VexRiscv/blob/master/src/test/cpp/regression/main.cpp#L193

enjoy-digital commented 5 years ago

@Dolu1990: in fact i already have the verilator simulation that is working fine, just need improve it a little bit load more easily the vmlinux.bin/vmlinux.dtb and initramdisk to ram. But yes, we'll use what it more convenient for you. I'll look at the your regression env and your golden model.

Dolu1990 commented 5 years ago

@enjoy-digital Can you show me the verilator testbench sources :D ?

Dolu1990 commented 5 years ago

@kgugala Which CPU configuration are you using, can you show me ? (The test workspace you pointer isn't using caches nor MMU)

daveshah1 commented 5 years ago

The config I am using is at https://github.com/daveshah1/VexRiscv-verilog/blob/linux/src/main/scala/vexriscv/GenCoreDefault.scala (which has a few small tweaks compared to @kgugala's, to skip over FENCEs for example).

Dolu1990 commented 5 years ago

@enjoy-digital The checks between the golden model and the RTL are :

Register file writes
Peripheral accesses
Some liveness checks

It should be enough to find out divegences fast.

@daveshah1 Jumping over Fence instruction is probably fine for the moment. But jumping over iFence instruction isn't. There is no cache coherency between the instruction cache and the data cache.

Need to use the caches fluch :) Is that used by some ways ?

Dolu1990 commented 5 years ago

(Memory coherency issues is something which is automaticaly catched by the golden model / RTL cross checkes)

daveshah1 commented 5 years ago

As it stands it looks like all the memory has been set up as IO, which I suspect means the L1 caches won't be used at all - I think LiteX provides a single L2 cache.

Indeed, to get useful performance proper use of caches and cache flushes will be needed.

kgugala commented 5 years ago

yes, we disabled the caches as they were causing a lot of troubles. It didn't make sense to fight both MMU and caches at the same time

Dolu1990 commented 5 years ago

@daveshah1 Ok ^^ One thing to know, is the instruction cache do not support IO instruction fetch, instead it cache them. (Supporting IO instruction fetch cost area, and isn't realy a usefull think, as far i know ?) So you still need to flush the instruction cache in iFence. It could be done easily.

@kgugala The cacheless plugins aren't aware about the MMU. I perfectly understand your point about avoiding the trouble of both at once. So my proposal, is :

I port MMU support to cacheless instruction and data plugins
We test things on that cacheless configuration
Later when things are stable enough, we can introduce caches stuff via a proper machine mode ifence emulation

To the roadmap would be :

To port MMU support into cacheless plugins
Implement the cross checked test environnement
Test and fix stuff until it is stable enough
Introduce the caches in the loop with proper machine mode emulation

kgugala commented 5 years ago

TBH the real long term solution will be to reimplement the MMU so it is fully compliant with the spec. Then we can get rid of the custom mapping code in Linux and restore the original mainline memory mapping code used for RV64.

I'm aware this will require quite significant amount of work in Vex itself.

Dolu1990 commented 5 years ago

I don't think it would require that much work. MMU is a relatively easy piece of hardware. I have to think about he heavyness in term of FPGA area of a fully compliant MMU.

But what is the issue of a software refilled MMU ? If it use the machine mode to do it, it became transparent to the linux kernel right ? So no linux kernel modification required, but just a piece of machine mode code to have in addition of the raw Linux port :) ?

daveshah1 commented 5 years ago

Yes, I think an M-mode trap handler is the proper solution. We can probably use it to deal with any missing atomic instructions too.

Dolu1990 commented 5 years ago

(troll on) We should not forget the ultimate goal : RISC-V linux on ice40 1K, i'm sure #28 would agree ^.^ (troll off)

kgugala commented 5 years ago

It just may be difficult to push the custom mapping code to Linux' mainline

daveshah1 commented 5 years ago

The trap handler need not sit in Linux at all, it can be part of the bootloader.

Dolu1990 commented 5 years ago

@kgugala By mapping you mean the different flags of each MMU TLB of VexRiscv (https://github.com/SpinalHDL/VexRiscv/blob/master/src/main/scala/vexriscv/plugin/MemoryTranslatorPlugin.scala#L51) ? If the given feature aren't enough, i'm happy to fix that in the first place

kgugala commented 5 years ago

@daveshah1 yes, it can. But that makes things even more complicated as two pieces of software will have to be maintained.

@Dolu1990 the flags were sufficient. One of the missing part is variable map size. AFAIK right now you can map only 4k pages. This made mapping of the whole kernel space impossible - the MMU's map table is to small to fit so many 4k entries. This is the reason we added this constant kernel space mapping hack. Also, in user space, there are many mappings for different contexts. Those mappings are switched very often, so rewriting those every time with 2 custom instructions for every 4k page is very slow.

We haven't tested properly if the reloading is done properly, and if the mappings are refreshed correctly in the MMU itself. This, IMO, is the reason of a segfault we're seeing in user space.

Dolu1990 commented 5 years ago

@kgugala the initial idea tohandle pages bigger than 4KB was to just translate them on demand to 4KB ones in the TLB For example

Access at vitual address 0x1234568, via a 16 MB page which map 0x12xxxxxx to 0xABxxxxxx => Software emulation which add in the TLB cache a 4KB TLB which map 0x12345xxx to 0xAB345xxx

But now that i think about it, maybe the support of 16MB pages can be added for very few hardware addition over the exisiting solution.

The software model should also be able to indirectly pick up MMU translation errors :)

enjoy-digital commented 5 years ago

@Dolu1990: the simulation source is here: https://github.com/enjoy-digital/litex/blob/master/litex/utils/litex_sim.py and https://github.com/enjoy-digital/litex/tree/master/litex/build/sim

With a vmlinux.bin with the .dtb appended, we can run linux with on mor1kx with: litex_sim --cpu-type=or1k --ram-init=vmlinux.bin

For now for Vexriscv, i was hacking the ram initialization function to aggregate the vmlinux.bin, vmlinux.dtb and initramdisk.gz, but i'll thinking about using a .json file to describe how the ram needs to be initialized:

{
    "vmlinux.bin":    0x00000000,
    "vmlinux.dtb":    0x01000000,
    "initramdisk.gz": 0x01002000,
}

and then just do: litex_sim --cpu-type=vexriscv --ram-init=ram_init_linux.json

kgugala commented 5 years ago

The software right now maps the pages on demand.

@Dolu1990 The problem is that kernel space has to be mapped for the whole time. The whole kernel runs in S mode in virtual memory. This space cannot be unmapped, because any interrupt/exception (including TLB miss) etc may happen at any time. We cannot end up in a situation where TLB miss causes jump to a handler, which is not mapped at the moment causing another TLB miss. This would end up in terrible miss->handler->miss loop

daveshah1 commented 5 years ago

I have the userspace segfault issue seemingly fixed!

Screenshot from 2019-03-16 10-53-13

The problem was that the mapping code in the kernel was always mapping pages as RWX. But the kernel relies on pages being mapped read-only and triggering a fault on writes (e.g. for copy-on-write optimisations). Fixing that, and hacking the DBusCached plugin so that all write faults trigger a store page fault exception (the store fault exception was going to M-mode and causing problems, need to look into correct behaviour here), seems to result in a reliable userspace.

Dolu1990 commented 5 years ago

@enjoy-digital Ahh, ok, so it is a SoC level simulation. I think the best would realy be to stick to a raw CPU simulation in Verilator, to realy keep a full control over the CPU, and keep it raw nature, and keep simulation performance as high as possible to reduce sim time.

@kgugala This is the purpose of Machine mode emulation. Basicaly, in machine mode, the MMU translation is off, and the cpu can do all sort of things, without the supervisor mode even being able to notice it.

There is the schedule of a user space TLB miss :

Use space TLB miss
It trigger a machine mode exception
The machine mode MMU software refiller check the TLB in the main memory
If there is a memory TLB existing, it refill the hardware MMU and return into user mode without supervisor even knowing
If there was no memory TLB to map required access, it emulate a supervisor exception and return the execution to the supervisor.

kgugala commented 5 years ago

@daveshah1 this is awesome

Dolu1990 commented 5 years ago

@daveshah1 Great :D

ghost commented 5 years ago

What do you think about no-MMU support for Linux on RISC-V? Would it be possible? That would require hacking the kernel, instead of VexRiscv, of course.

enjoy-digital commented 5 years ago

Awesome @daveshah1!

roman3017 commented 5 years ago

@wm4: https://en.wikipedia.org/wiki/MClinux

daveshah1 commented 5 years ago

Screenshot from 2019-03-16 15-05-20

liteeth is working too! Although the combination of lack of caching and expensive context switches means this takes the best part of a minute...

kgugala commented 5 years ago

@daveshah1 on what platform do you run it? Do you run it with the ramdisk you shared before? I tried to run it and it seems to be stucking at:

[    0.000000] RAMDISK: gzip image found at block 0

I boot linux commit d27b7d5cb658ccb9ade4bea6a12feb08ebdcc541

daveshah1 commented 5 years ago

initramdisk.gz

Reuploading ramdisk just in case, but don't think there have been any changes.

The kernel requires the LiteX timer to be connected to the VexRiscv timerInterruptS, and the cycle/cycleh CSRs to work. ime 'stuck during boot' has generally been timer-related problems.

My platform:

Lattice ECP5 Versa-5G development board
https://github.com/daveshah1/versa_ecp5_dram/tree/ethsoc - ethernet target, built with trellis
https://github.com/daveshah1/VexRiscv/tree/Supervisor
https://github.com/daveshah1/VexRiscv-verilog/tree/linux
https://github.com/daveshah1/litex/tree/vexriscv-linux
https://github.com/daveshah1/litex-linux-riscv
https://github.com/daveshah1/yosys/tree/ecp5_transp_dram
https://github.com/daveshah1/nextpnr/tree/placer_heap

kgugala commented 5 years ago

This must be the timer interrupt then. I'll add this to my test platform

kgugala commented 5 years ago

Oh, I see you run it with the latest Litex. I tried it on the system we used for the initial work (from December 2018). I have to rebase our changes

kgugala commented 5 years ago

I bumped all the parts and have it running on Arty :)

daveshah1 commented 5 years ago

Awesome! I just pushed some very basic kernel-mode emulation of atomic instructions, which has improved software compatibility a bit (the current implementation I've done isn't actually atomic yet, as it ignores acquire/release for now...)

roman3017 commented 5 years ago

@Dolu1990 If I were to use RiscvGolden as you have suggested, would I run it with VexRiscv/src/test/cpp/regression$ make DEBUG_PLUGIN_EXTERNAL=yes Then connect openocd with openocd$ openocd -c "set VEXRISCV_YAML cpu0.yaml" -f tcl/target/vexriscv_sim.cfg Then load vmlinux, dtb and initrd over gdb. I just want to make sure to use it as expected.

SpinalHDL / VexRiscv

Linux on VexRiscv #60