GaloisInc / BESSPIN-CloudGFE

The AWS cloud deployment of the BESSPIN GFE platform.
Apache License 2.0
2 stars 2 forks source link

AWSteria: Execute implementation plan adding L2 with coherence support for L1 and host-to-FPGA DMA #118

Open rsnikhil opened 4 years ago

rsnikhil commented 4 years ago

Some expected sub-tasks:

(a) Adapt Flute L1 write-back cache and Toooba L2 cache into an L1/L2 cache system for Flute and AWS DMA access, in vanilla Flute.

(b) Redo SoC structure: AWS DMA PCIS connects to DMA port of (a); likely reduce double AXI4 fabric to one; I/O overlay network, etc.

(c) Track (a) and (b) with a CHERI-Flute version (tags, Cambridge AXI)

rsnikhil commented 4 years ago

I've added the new L1 cache with write-back policy to the GitHub Flute repo, while retaining the older write-through L1 cache as an option. @gameboo will study this to plan and do enhancements for CHERI tags/capabilities. Meanwhile I move on to integrate the RISCY-OOO L2 cache behind this write-back L1, making mods to the L1 to support coherence with L2. I will likely start with the version of RISCY-OOO L2 that U.Camb has already modified (from Toooba work) to support tags/capabilities.

rsnikhil commented 4 years ago

Progress report

kiniry commented 4 years ago

Thank you for the update, @rsnikhil.

rsnikhil commented 3 years ago

Progress report:

kiniry commented 3 years ago

Thanks for the update.

rsnikhil commented 3 years ago

Progress report:

rsnikhil commented 3 years ago

Progress report:

Next:

At that point, can hand-off to Alexandre to do CHERI fixups, while I proceed to AWSteria to connect DMA and complete the Virtio work.

rsnikhil commented 3 years ago

Progress Report:

rsnikhil commented 3 years ago

Merged-in some changes from @gameboo (Alexandre Jouannou) to the cache setup to abstract over 'cache words' so that it's easy to redefine it to include CHERI tags; cleaned up; handed off to Alexandre for 'CHERI-fication' of the new integrated L1-coherent-L2 system. I am moving on to integrate these new changes into AWSteria, including the coherent DMA.

rsnikhil commented 3 years ago

Summary of progress since end of July

rsnikhil commented 3 years ago

Currently debugging FreeBSD boot (without Virtio) in simulation, using same image that booted in late June before restucturing AWSteria for coherent L2 cache. That simulation took > 10 hours.

Currently encountering an assertion failure in I-Cache at 3h40m; investigating.

rsnikhil commented 3 years ago

Results of several attempts to boot FreeBSD (without Virtio, for now) in AWSteria Bluesim simulation: reaching about 190 million instructions, then failures. Failures vary across runs. 2 failures are (different) assertion failures in the cache. 1 failure is a FreeBSD 'kernel panic' ("Fatal page fault at 0xffffffc0004728f0: 0x00100000000008") dropping it into KDB. Final console message before failure is 'start_init: trying /sbin/init'. Debugging continues.

rwatson commented 3 years ago

I'm not sure if the FreeBSD kernel you are working with has internal assertions enabled -- e.g., INVARIANTS, WITNESS -- but we've found that while those hugely slow down kernel boot, they are excellent for catching memory subsystem bugs, as they do self checks on numerous data structures, atomic operations, etc. The kernel will print out a message on boot warning about the performance hit, if they are configured. If you don't get those messages (one for each debugging feature), we should be able to provide kernels with them enabled.

rsnikhil commented 3 years ago

I have a kernel from June (or earlier) that does have WITNESS enabled. My most recent runs were with a kernel Jessica sent me on Sep 3 where WITNESS is not enabled (I had asked for this, to improve simulation speed, but I'm not sure it's improving sim speed so much). One suspicion is that my L1 cache is not interacting properly with MIT's L2 cache in the corner case where and L1-to-L2 request and an L2-to-L1 request may refer to the same cache line.

jrtc27 commented 3 years ago

For completeness I've now added debug (i.e. INVARIANTS + WITNESS) versions alongside the existing kernels (just drop the -NODEBUG from the name).

jrtc27 commented 3 years ago

Also, in case it help you in your debugging quest, start_init: trying /sbin/init is the point at which FreeBSD starts the first userspace process, which itself will fork and exec additional ones, and in that error message 0xffffffc0004728f0 is satp and 0x00100000000008 is stval (i.e. the virtual address for which a dereference was attempted). That address itself looks very wrong; we're deep in the kernel (and not in the few copyin/copyout-like functions) and so should always be trying to access kernel-space virtual addresses, which for FreeBSD are always the negative/top half of the address space (though that address is not even a valid Sv39 address).

podhrmic commented 3 years ago

@rsnikhil how can I help you with the debugging here? Would it be helpful if I run the simulation as well?

rsnikhil commented 3 years ago

@podhrmic Let's hold off for the moment; your help would be most useful (I think) when I'm debugging Virtio on the setup, and I've not reached that point yet.

Since Monday Sep 14 I've been hammering at the Flute's new memory system (WB_L1_L2, writeback with L2 caches coherent with the L1 caches) using my Carnyx memory stress-test tool. I have encountered and fixed 4 bugs so far. They were all concurrency bugs in L1, due to a request from L2 that arrives "asynchronously" at L1 at certain delicate moments in L1's normal activity. The request from L2 must be serviced at higher priority to avoid possible deadlock, requiring some of L1's state to saved and restored properly after the service.

Carnyx fits into the GFE just like a normal CPU, except it is not a CPU (it does not execute any RISC-V instructions) It just generates (controlled) random memory requests into Flute's memory system, records requests and responses, and checks them against a memory model. Failures are deterministically reproducible, and it quickly uncovered the above-mentioned 4 bugs, within a few 1000 requests to tens of thousands of requests, taking few seconds to few minutes.

(By comparison, FreeBSD booting reached a 190 million instructions, simulating for 10+ hours, before it failed, and it's not clear what happened. Multiple such runs produced different failures after 10-hour runs. Those specific assertion-failures have been seen in the Carnyx experiments and have been fixed.)

I'm continuing with Carnyx until I can get it to execute a billion requests without error (it gets to about 123K requests so far), and will be retrying the FreeBSD boot continuously as the memory system quality improves.

rsnikhil commented 3 years ago

Retrying FreeBSD boot, we no longer see the assertion failures seen earlier (possibly fixed by the Carnyx-based debugging described in previous comment). It now gets stuck, possibly a deadlock.

Testing the memory system with Carnyx, we also encountered a 'stuck' situation after 123K transactions, which we were able to shrink to about 15K transactions.

The issue seems to be inside the L2 (a.k.a. LLC, last-level cache):

Neither request misses in L2 (both requested lines have been loaded from mem earlier, and were not written back)

Investigating (and have also asked Sizhuo Zhang, author of the L2 code, for his opinion).

rsnikhil commented 3 years ago

Still no joy on retrying boot of FreeBSD (without virtio), in Bluesim simulation, on AWSteria ('vanilla' version, i.e., non-CHERI).

Pondering next move ... (including whether I should abandon 2-day simulations and switch to FPGA execution).