Open rsnikhil opened 4 years ago
I've added the new L1 cache with write-back policy to the GitHub Flute repo, while retaining the older write-through L1 cache as an option. @gameboo will study this to plan and do enhancements for CHERI tags/capabilities. Meanwhile I move on to integrate the RISCY-OOO L2 cache behind this write-back L1, making mods to the L1 to support coherence with L2. I will likely start with the version of RISCY-OOO L2 that U.Camb has already modified (from Toooba work) to support tags/capabilities.
Progress report
Thank you for the update, @rsnikhil.
Progress report:
Identified and fixed various bugs in new Write-back L1
In a new WB_L1_L2 directory, developing the L1+L2 cache with coherence and coherent DMA
Changed Write-back L1 Cache to have line-wide interface to L2, instead of 64b
Approaching point where there is a place-holder for the L2 from RISCY-OOO/Toooba Will initially test on all ISA tests with a 'null' L2
Thanks for the update.
Progress report:
Progress report:
Next:
At that point, can hand-off to Alexandre to do CHERI fixups, while I proceed to AWSteria to connect DMA and complete the Virtio work.
Progress Report:
Merged-in some changes from @gameboo (Alexandre Jouannou) to the cache setup to abstract over 'cache words' so that it's easy to redefine it to include CHERI tags; cleaned up; handed off to Alexandre for 'CHERI-fication' of the new integrated L1-coherent-L2 system. I am moving on to integrate these new changes into AWSteria, including the coherent DMA.
Summary of progress since end of July
Currently debugging FreeBSD boot (without Virtio) in simulation, using same image that booted in late June before restucturing AWSteria for coherent L2 cache. That simulation took > 10 hours.
Currently encountering an assertion failure in I-Cache at 3h40m; investigating.
Results of several attempts to boot FreeBSD (without Virtio, for now) in AWSteria Bluesim simulation: reaching about 190 million instructions, then failures. Failures vary across runs. 2 failures are (different) assertion failures in the cache. 1 failure is a FreeBSD 'kernel panic' ("Fatal page fault at 0xffffffc0004728f0: 0x00100000000008") dropping it into KDB. Final console message before failure is 'start_init: trying /sbin/init'. Debugging continues.
I'm not sure if the FreeBSD kernel you are working with has internal assertions enabled -- e.g., INVARIANTS, WITNESS -- but we've found that while those hugely slow down kernel boot, they are excellent for catching memory subsystem bugs, as they do self checks on numerous data structures, atomic operations, etc. The kernel will print out a message on boot warning about the performance hit, if they are configured. If you don't get those messages (one for each debugging feature), we should be able to provide kernels with them enabled.
I have a kernel from June (or earlier) that does have WITNESS enabled. My most recent runs were with a kernel Jessica sent me on Sep 3 where WITNESS is not enabled (I had asked for this, to improve simulation speed, but I'm not sure it's improving sim speed so much). One suspicion is that my L1 cache is not interacting properly with MIT's L2 cache in the corner case where and L1-to-L2 request and an L2-to-L1 request may refer to the same cache line.
For completeness I've now added debug (i.e. INVARIANTS + WITNESS) versions alongside the existing kernels (just drop the -NODEBUG from the name).
Also, in case it help you in your debugging quest, start_init: trying /sbin/init
is the point at which FreeBSD starts the first userspace process, which itself will fork and exec additional ones, and in that error message 0xffffffc0004728f0 is satp and 0x00100000000008 is stval (i.e. the virtual address for which a dereference was attempted). That address itself looks very wrong; we're deep in the kernel (and not in the few copyin/copyout-like functions) and so should always be trying to access kernel-space virtual addresses, which for FreeBSD are always the negative/top half of the address space (though that address is not even a valid Sv39 address).
@rsnikhil how can I help you with the debugging here? Would it be helpful if I run the simulation as well?
@podhrmic Let's hold off for the moment; your help would be most useful (I think) when I'm debugging Virtio on the setup, and I've not reached that point yet.
Since Monday Sep 14 I've been hammering at the Flute's new memory system (WB_L1_L2, writeback with L2 caches coherent with the L1 caches) using my Carnyx memory stress-test tool. I have encountered and fixed 4 bugs so far. They were all concurrency bugs in L1, due to a request from L2 that arrives "asynchronously" at L1 at certain delicate moments in L1's normal activity. The request from L2 must be serviced at higher priority to avoid possible deadlock, requiring some of L1's state to saved and restored properly after the service.
Carnyx fits into the GFE just like a normal CPU, except it is not a CPU (it does not execute any RISC-V instructions) It just generates (controlled) random memory requests into Flute's memory system, records requests and responses, and checks them against a memory model. Failures are deterministically reproducible, and it quickly uncovered the above-mentioned 4 bugs, within a few 1000 requests to tens of thousands of requests, taking few seconds to few minutes.
(By comparison, FreeBSD booting reached a 190 million instructions, simulating for 10+ hours, before it failed, and it's not clear what happened. Multiple such runs produced different failures after 10-hour runs. Those specific assertion-failures have been seen in the Carnyx experiments and have been fixed.)
I'm continuing with Carnyx until I can get it to execute a billion requests without error (it gets to about 123K requests so far), and will be retrying the FreeBSD boot continuously as the memory system quality improves.
Retrying FreeBSD boot, we no longer see the assertion failures seen earlier (possibly fixed by the Carnyx-based debugging described in previous comment). It now gets stuck, possibly a deadlock.
Testing the memory system with Carnyx, we also encountered a 'stuck' situation after 123K transactions, which we were able to shrink to about 15K transactions.
The issue seems to be inside the L2 (a.k.a. LLC, last-level cache):
Neither request misses in L2 (both requested lines have been loaded from mem earlier, and were not written back)
Investigating (and have also asked Sizhuo Zhang, author of the L2 code, for his opinion).
Still no joy on retrying boot of FreeBSD (without virtio), in Bluesim simulation, on AWSteria ('vanilla' version, i.e., non-CHERI).
Simulates successfully for 39 hours, 393 million instructions.
We see 80 lines of expected console output, with only 4 more lines expected before the kernel prompt. Tail of console output:
---- 67 lines of expected console output before this ---- start_init: trying /sbin/init Setting up sysctls sysctl: unknown oid 'kern.polling.user_frac' at line 2 sysctl: unknown oid 'machdep.unaligned_log_pps_limit' at line 5 kern.coredump: 1 -> 0 kern.random.harvest.mask: 991 -> 735 mount / rw entropy read from /boot/entropy entropy read from /var/db/entropy/entropy.0 entropy read from /var/db/entropy/entropy.1 create 500m TMPFS at /tmp set up loopback lo0: link state changed to UP ---- REACHED HERE ---- generate host keys start sshd random: unblocking device. exec /bin/sh
and then we get stuck: stops executing instructions, suggesting that fetch, load or store is stuck.
Pondering next move ... (including whether I should abandon 2-day simulations and switch to FPGA execution).
Some expected sub-tasks:
(a) Adapt Flute L1 write-back cache and Toooba L2 cache into an L1/L2 cache system for Flute and AWS DMA access, in vanilla Flute.
(b) Redo SoC structure: AWS DMA PCIS connects to DMA port of (a); likely reduce double AXI4 fabric to one; I/O overlay network, etc.
(c) Track (a) and (b) with a CHERI-Flute version (tags, Cambridge AXI)