why doesn't FireSim Just Work?

kiniry commented 4 years ago

DARPA asked in our meeting today, why doesn't FireSim work in just providing a vehicle for dropping in a CPU? In particular, might this be a way to get UMich and Lockheed going in the cloud quickly? We need an authentic characterization of why FireSim works or does not work for our Wednesday check-in with DARPA.

lolsborn commented 4 years ago

FireSim has a novel clock architecture specifically designed for research projects needing to simulate complex network systems with end-to-end clock-cycle simulation. The intention is to be able to simulate entire server farms of RiscV machines running at several gigahertz using much slower simulated devices.

This unique "multi-clock" system requires that all of the included peripherals are implemented in Chisel and are implemented to be compatible with this type of clock architecture. Unfortunately, this means that none of the peripherals are portable to other SoCs.

lolsborn commented 4 years ago

@kiniry @rfoot is there a meeting notes doc that this can go in?

rfoot commented 4 years ago

@lolsborn @kiniry - there isn't - we need to put it in an email and send it back to DARPA as the above is true - what would keep us from using the firesim SoC with the SSITH Processor and having that be a viable platform with which to test it?

lolsborn commented 4 years ago

Comments from Dylan Hand:

Using FireSim as the top-level was something I considered but initially dismissed when I thought we’d continue the connectal work. Now that we are looking at starting from “scratch” anyway, it is worth some consideration. As @Steven Osborn said, it is not the same as just building hardware in a FPGA - everything goes through a translation layer where the hardware is manipulated to act as if all timing was cycle accurate for arbitrary latencies between blocks. The relative latencies are controllable though. For example, you could set 1:1 ratios between the CPU and DDR clocks and it would act mostly as if there were no translation layer (but still not quite).

As far as I can tell, using FireSim could immediately get us UART, Ethernet, a Block device, memory peek/poke, and linux support. So there is a lot of potential there, but we should carefully consider the possible pitfalls. Here are a few I can think of now:

It is under active development and has only recently gotten good enough to even consider for a project like this. That is both good and bad. Good in that there are smart people working on this and fixing problems (they have resolved issues for us in less than a day and are very responsive to questions), but things can change quickly. Some pieces are more WIP than others.
- The top-level must be chisel and furthermore the top-level bus must be TileLink for all the latency insensitive clocking magic to happen. Our GFE as provided would work fine with a TileLink interconnect, but it alters the development challenges. In house (and at Bluespec), we'd need people comfortable working in Chisel and have a good understanding of the chipyard/firesim/rocket-chip infrastructure. It's possible much of this could be hidden from TA1 teams as long as they don't need to make system-level changes.
Along those lines. with a TileLink top-level bus we’d need to better understand any memory tagging or system-level changes TA1 teams have already made to make sure it’d be compatible with this new approach. If all of their changes are happening within the “core” boundary, that’ll make things significantly easier. If they are using custom AXI components, those could potentially be show-stoppers.
The setup process is extremely automated, almost to a fault. It can be difficult to deviate from the prescribed process. If you can stay within their flow, it is very quick to get from github to emulated Linux on an F1 instance though (build times excluded).
There is obviously great support for the Rocket processors. I would not claim “drop-in” as our version of Rocket in the GFE is a bit old, but it should port nicely. Bluespec processors will need to go through FireSim's “BlackBox” process, which is a recent addition. This should work, but it is one of the potentially big complications with using FireSim. I’m waiting to hear back from another HW eng on our team about the stability of this approach in the current version of FireSim. We are currently using this flow for simulating an accelerator core (NVDLA) written in Verilog along with a Rocket processor.
Full GDB debugging would still be a challenge.. it may even be more difficult due to the multi-cycle clocking infrastructure, but have not dug in on this yet.

rfoot commented 4 years ago

@charlie-bluespec @darius-bluespec - your input would be valuable here

kiniry commented 4 years ago

SRI/Cambridge did introduce their own AXI component, so depending upon its shape and nature, that may be a problem. They seem to shrug when it comes to having to update it again wrt CloudGFE work, as they did in this morning's check-in meeting.

kiniry commented 4 years ago

If SRI/Cambridge are happy with the Connectal approach, then the only teams we have left to worry about are LHM and, in the margins, UMich. Both are Chisel houses.

kiniry commented 4 years ago

If we had a snapshot of either team's "Rocket replacement", we could try to just drop it in and do a build. Unfortunately, to my knowledge, neither has ever pushed a code drop to us via GitLab-ext private repositories.

kiniry commented 4 years ago

Full GDB support is a desired, but not mandatory, requirement for CloudGFE.

rwatson commented 4 years ago

Tagging @swm11.

rwatson commented 4 years ago

@lolsborn: And I guess we would likely need FreeBSD drivers (and possibly FreeRTOS drivers) for Firesim’s peripherals? Do you have a handy pointer to both specs for the devices, and the Linux drivers, that we could glance at to assess the level of effort likely required on the FreeBSD side? (Assuming they’ve not decided to knock off an existing well-supported device, anyway...)

jrtc27 commented 4 years ago

Regarding AXI, we replace the internal AXI parts in the core. We present a normal AXI interface to the outside world (in the Xilinx Block Diagram sense) that uses the existing Xilinx interconnect IP just like the GFE, though we have wider ID fields (which the Xilinx tools automatically account for). We also alter the interconnect's address map to allow only the core to write to the cached half of DRAM, as the tag controller lives inside the core and external devices should not be able to bypass it and alter memory under its control. I believe the tag cache inside the tag controller is also writeback, so any wstrb concerns should be limited to the same as the baseline.

lolsborn commented 4 years ago

@rwatson we would need to port all of the drivers for BSD assuming the BSD folks aren't using a Connectal solution. Here are a couple of the drivers NIC: https://github.com/firesim/icenet-driver Block device: https://github.com/firesim/iceblk-driver

charlie-bluespec commented 4 years ago

I agree with all of Dylan's comments, and have this to add:

For Firesim to "just work" we need the GFE processors to "just work" in Chipyard. The P1 and P2 Rocket processors shouldn't be too far off from just working.

But the deltas for Piccolo and Flute are significant, due to Chisel/BSV language differences and other processor implementation differences. We just recently evaluated the integration of Flute into Chipyard for another project, and it's pretty clear to us that Firesim would be much more effort and risk than either using Connectal or porting the existing Xilinx block design to AWS.

rwatson commented 4 years ago

On 14 Apr 2020, at 17:20, Joseph Kiniry notifications@github.com wrote:

Full GDB support is a desired, but not mandatory, requirement for CloudGFE.

NB: If we (collectively) need to write new network and storage device drivers, having access to a debugger will make a big difference.

kiniry commented 4 years ago

Given our discussion with LHM just now, it sounds like they are open to the idea of using FireSim only for them and Connectal for SRI/Cambridge+MIT. I’m going to investigate this option. You’ll see some new issues get filed on such this afternoon.

On Apr 14, 2020, at 11:41, charlie-bluespec notifications@github.com wrote:

I agree with all of Dylan's comments, and have this to add:

For Firesim to "just work" we need the GFE processors to "just work" in Chipyard. The P1 and P2 Rocket processors shouldn't be too far off from just working.

But the deltas for Piccolo and Flute are significant, due to Chisel/BSV language differences and other processor implementation differences. We just recently evaluated the integration of Flute into Chipyard for another project, and it's pretty clear to us that Firesim would be much more effort and risk than either using Connectal or porting the existing Xilinx block design to AWS.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DARPA-SSITH-Demonstrators/BESSPIN-CloudGFE/issues/26#issuecomment-613613680, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADU5P2N6TR7VQXSUZOYFWTRMSU6RANCNFSM4MHHHCCQ.

dhand-galois commented 4 years ago

@jrtc27 Just to clarify, are all of your AXI modifications within what we call the ssith_processor_0 module in the GFE block diagram? If so, that may work in FireSim.

@charlie-bluespec Could you provide some more details on what you viewed as the issues with integrating Flute into chipyard?

I am warming up to the idea of creating a chipyard/firesim system with a BlackBox processor connection via AXI - in other words, replace our top-level block design using the FireSim and keep the ssith_processor_0 module as the boundary between what we provide and what the TA1 teams have modified. As long as ssith_processor_0 is a single clocked verilog module, it should be somewhat straightforward to drop in any of the teams' code. The tough part of this experiment will be replacing the rocket tile with our custom blackbox tile.

dhand-galois commented 4 years ago

@kiniry I'll just say I am concerned with using two different approaches. It can easily double the amount of work required on this project.

kiniry commented 4 years ago

I'm concerned about that too. But what DARPA is emphasizing to us is time-to-FETT launch, not effort on our part. Thus, if we have the performers and it gets us to the finish line earlier, but it costs a bit more $, that's where they'd aim us.

brooksdavis commented 4 years ago

@rwatson we would need to port all of the drivers for BSD assuming the BSD folks aren't using a Connectal solution. Here are a couple of the drivers NIC: https://github.com/firesim/icenet-driver Block device: https://github.com/firesim/iceblk-driver

At a glance, theses are very small drivers with an appropriate license (choice of BSD-3-Clause or GPLv2) so that's promising.

jrtc27 commented 4 years ago

@jrtc27 Just to clarify, are all of your AXI modifications within what we call the ssith_processor_0 module in the GFE block diagram? If so, that may work in FireSim.

Yes, other than changing the widths of the AXI ID signals coming out of ssith_processor_0, so that would need to be parameterisable.

@charlie-bluespec Could you provide some more details on what you viewed as the issues with integrating Flute into chipyard?

I am warming up to the idea of creating a chipyard/firesim system with a BlackBox processor connection via AXI - in other words, replace our top-level block design using the FireSim and keep the ssith_processor_0 module as the boundary between what we provide and what the TA1 teams have modified. As long as ssith_processor_0 is a single clocked verilog module, it should be somewhat straightforward to drop in any of the teams' code. The tough part of this experiment will be replacing the rocket tile with our custom blackbox tile.

The current JtagTap.bsv in the BSV cores has a second clock (tck), but I assume that part of things would need rewriting for FireSim anyway.

kiniry commented 4 years ago

I'm currently doing some code spelunking on LMCO's example pipelines. For TA-2 team members with appropriate ACLs, see https://gitlab-ext.galois.com/ssith/ta1-lmco

rwatson commented 4 years ago

@rwatson we would need to port all of the drivers for BSD assuming the BSD folks aren't using a Connectal solution. Here are a couple of the drivers NIC: https://github.com/firesim/icenet-driver Block device: https://github.com/firesim/iceblk-driver

At a glance, theses are very small drivers with an appropriate license (choice of BSD-3-Clause or GPLv2) so that's promising.

I wonder if these peripherals work usefully in simulation, or only in AWS F1, from our perspective. This could quite affect debugging efficiency. As would access to JTAG / GDB.

dhand-galois commented 4 years ago

After a couple hours of mucking around in chipyard I have a simulating system using the GFE Bluespec P2 as its core running and passing at a few basic ISA tests. It would be interesting to see if this actually works in FireSim as-is, but my guess is it would require actually fixing some of the issues I currently solved by repeatedly pummeling things until it worked.

An early assessment of some of the issues we'd have to address if we go with chipyard/FireSim:

The CLINT is now used to kick-off the processor after the host loads the memory. We could either move the CLINT external to the core or add a new AXI slave port to allow access to the internal one. Alternatively, we could forgo the standard assisted boot process but that may cause more trouble.
The PLIC is also external, but this seems less difficult to solve as the host does not interact with it.
We'd still need to expose the debug module interface ourselves for full gdb access.
the some of the other issues we already mentioned.

kiniry commented 4 years ago

As we just discussed in our standup, @dhand-galois is going to push this experiment to a branch of our fork of chipyard, I'll review it, and then we'll sync about (a) doing an experimental build of LMCO's smoketest CPU in chipyard, and then (b) spinning up either/both GFE and LMCO's smoketest in FireSim.

dhand-galois commented 4 years ago

I cleaned up my code a bit to make it easier to reproduce and checked it in to a branch on our chipyard repo here: https://gitlab-ext.galois.com/ssith/chipyard/-/tree/ssith-core

As I mentioned briefly at standup, I created a replacement blackbox core (based on the Ariane core) that matches our SSITH core I/O - two AXI master ports, the tandem verification port, clk, reset, etc. Then I augmented the resources that get loaded into the simulator to pull in the bluespec P2's verilog. To work around the multiple CLINT/PLIC issue, I have the chipyard CLINT feeding into a normal interrupt input of the SSITH PLIC, and then modified the bootrom to enable the SSITH PLIC. When the test environment triggers the chipyard CLINT to issue an interrupt the SSITH PLIC interrupts the core, which claims and clears both interrupts and jumps to DRAM. I also moved the bootrom and DRAM address to match our mapping from the GFE.

To recreate it, check out my branch and run:

./scripts/init-submodules-no-riscv-tools-nolog.sh
cd sims/verilator
~~ Update SSITH_PROC_DIR in Makefile to point to a checkout of the Flute repo's hdl folder ~~
make CONFIG=SSITHConfig
./simulator-chipyard-SSITHConfig $RISCV/riscv64-unknown-elf/share/riscv-tests/isa/rv64ui-p-sub

Your riscv-tests should be compiled against the GFE version, which puts the tests at 0xC0000000.

dhand-galois commented 4 years ago

Another update - I was successful in getting a FireSim system built with the BSV P2 core. I have a simulation running locally (not in AWS) passing ISA tests, which simulates the FPGA-available I/O and includes the MIDAS/Golden Gate transformation. So that's a pretty good sign a build on the FPGA will also work. I'll plan on kicking off an AFI build sometime today if there isn't too much additional work involved with adding the BSVP2 resources.

Waiting on AWS approval for actually running tests on an F1 instance.

kiniry commented 4 years ago

Cool. I have an AWS F1 spun up (with only 8 vCPUs, I have a pending request to bump) and I have build and configured FireSim there. I am about to build some example instances/bitstreams now to better understand the AWS AMI flow and tooling.

kiniry commented 4 years ago

If you want to use my or @lolsborn's creds to spin up an instance of your own, we can do that. I expect I'll get my quota bump today.

kiniry commented 4 years ago

FYI, concurrent with these experiments: We are doing a first code review of MIT's Connectal-based work tomorrow from 11:00-13:00 PT (I think). An invite is going out and you will be an optional invite.

dhand-galois commented 4 years ago

Unfortunately I have an overlap 11-12:30. Would have liked to see what they're doing.

I expect AWS will approve my request sometime today, but I'll reach out if they don't whenever my AFI build is ready.

kiniry commented 4 years ago

We can record it, if you think it'd be useful. Otherwise we'll just take notes carefully and debrief you.

kiniry commented 4 years ago

I've built Chipyard and FireSim on my AWS node and am now building the example 1 target node Linux image.

dhand-galois commented 4 years ago

I replicated my local sim build on AWS (still running in a simulator). Luckily it was not much trouble to make the changes compatible with FireSim's buildafi process, so that is running now. We'll see what happens..

kiniry commented 4 years ago

Per a discussion I had moments ago with @dhand-galois, I'll continue this line of exploration as it will let me attempt to bring into FireSim the older version of Rocket we snapshotted in GFE in order to explore how well this might work for LMCO and UMich. Concurrently @dhand-galois is exploring the Bluespec-wrapped-in-Chipyard-wrapped-in-FireSim approach and he is synthesizing our P2 Flute bitstream now to see if/how that works with full Linux.

kiniry commented 4 years ago

CC @jameyhicks on all of the above, so that he can track this line of exploration.

dhand-galois commented 4 years ago

To continue with updates:

The AFI built successfully, but my limit increase with AWS is still pending (they did respond just to say it's taking more time than usual). I can share the AFI image with whoever would like I give it a spin - just need your AWS user id.

I should be able to attend the sync up tomorrow as my conflict was rescheduled.

kiniry commented 4 years ago

My AFI nearly finished last night before failing due to a mysterious message.

I’ll share my ID with you this morning.

What are your current limits and what did you request? Did you explain you are a part of the Galois team working on SSITH?

On Apr 16, 2020, at 8:18 PM, dhand-galois notifications@github.com wrote:

To continue with updates:

The AFI built successfully, but my limit increase with AWS is still pending (they did respond just to say it's taking more time than usual). I can share the AFI image with whoever would like I give it a spin - just need your AWS user id.

I should be able to attend the sync up tomorrow as my conflict was rescheduled.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

lolsborn commented 4 years ago

You need to request provisioning based on the number of virtual host CPUs you need so a base F1 instance has 8vcpus. They change the way the provisioned instances a while back from being host number based to vcpus so you have to be clear you need 8vcpus on F1 and not just 1 "f1 instance" or they may not provide what you need.

On Fri, Apr 17, 2020, at 8:14 AM, Joseph Kiniry wrote:

My AFI nearly finished last night before failing due to a mysterious message.

I’ll share my ID with you this morning.

What are your current limits and what did you request? Did you explain you are a part of the Galois team working on SSITH?

On Apr 16, 2020, at 8:18 PM, dhand-galois notifications@github.com wrote:

To continue with updates:

The AFI built successfully, but my limit increase with AWS is still pending (they did respond just to say it's taking more time than usual). I can share the AFI image with whoever would like I give it a spin - just need your AWS user id.

I should be able to attend the sync up tomorrow as my conflict was rescheduled.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DARPA-SSITH-Demonstrators/BESSPIN-CloudGFE/issues/26#issuecomment-615301405, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAOZBDBDDPJACYNGC556BDRNBW5ZANCNFSM4MHHHCCQ.

kiniry commented 4 years ago

Per the stand-up this morning: My failure was due to bitstream being sent for encryption/obfuscation, I was in a queue, and my ssh connection failed. I'm ok on the vCPU front. At the moment I have 8 vCPUs (= 1 F1 FPGA) and I have requested 64 vCPUs (= 8 F1s at most).

lolsborn commented 4 years ago

Okay. Sorry I missed most of CloudGFE standup as CyberPhys ran long. Where you able to work this out? Are you building locally? The AWS FPGA developer image has Vivado with encryption needed installed, but it is a CentOS image. Otherwise you will have to contact Xilinx to get a feature license for your local machine.

On Fri, Apr 17, 2020, at 11:33 AM, Joseph Kiniry wrote:

Per the stand-up this morning: My failure was due to bitstream being sent for encryption/obfuscation, I was in a queue, and my ssh connection failed. I'm ok on the vCPU front. At the moment I have 8 vCPUs (= 1 F1 FPGA) and I have requested 64 vCPUs (= 8 F1s at most).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DARPA-SSITH-Demonstrators/BESSPIN-CloudGFE/issues/26#issuecomment-615399755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAOZBEQ3QDPXMO6WBMAS2TRNCOHZANCNFSM4MHHHCCQ.

kiniry commented 4 years ago

I'm building remotely. Locally I have a snapshot of Chipyard and FireSim just so I can read stuff offline, but I have everything checked out and built in a small AWS EC2 instance, per the Chipyard and Firesim documentation examples. I'm building bitstreams directly via that instance, not locally.

kiniry commented 4 years ago

I'm not going to go down the path of getting a feature license for my machine. If we do decide to do that, we should target it against a MAC (or whatever) in the Docker image we have now that contains Vivado, shared recently by @podhrmic.

kiniry commented 4 years ago

CC @rsnikhil @charlie-bluespec to keep Bluespec in the loop on these experiments.

kiniry commented 4 years ago

As we are continuing to experiment with FireSim this week to support Chisel-based TA-1 performers, I'm moving this issue to Sprint #2. @dhand and I will attempt to make a determination about supporting LMCO in particular this week.

kiniry commented 4 years ago

I'm removing @lolsborn as he no longer has point on this task but can track via CC.

dhand-galois commented 4 years ago

To continue updating this ticket with general FireSim developments...

I have been getting more successful at building and running customized FireSim images. So far I have build vanilla single-core Rocket, LMCO-hardened Rocket from their smoke test example, and a modified vanilla Rocket with a memory map that more closely matches the GFE. Linux is booting on all but the last image. Running smaller bare-metal ELF examples is also working, although the process is a bit cumbersome at the moment.

Remaining issues:

The networking stack is in worse shape than it seemed at first. We may want to try adopting some of the VirtIO work, at least for ethernet. Their current approach does not expose a full networking interface to the host, but only a tap device capable of passing SSH packets. And that is not currently functional for me.
Loading different programs is time consuming. The program load itself is very quick but it appears that a test ending puts the FPGA into an unusable state. To put it back into a known good state, I am running firesim kill && firesim infrasetup && firesim runworkload which is about a 2-3 min turnaround.
Currently building at 50MHz, which is even slower in practice as the "simulated" core speed is some fraction of the FPGA speed.

dhand-galois commented 4 years ago

I think we've sufficiently answered the original question with "turns out, it can work". Any objections to closing this issue?

dhand-galois commented 4 years ago

Per last comment and not seeing any objections, going to close this for now. Happy to re-open it or add additional issues for further discussion.

GaloisInc / BESSPIN-CloudGFE

why doesn't FireSim Just Work? #26