GaloisInc / BESSPIN-CloudGFE

The AWS cloud deployment of the BESSPIN GFE platform.
Apache License 2.0
2 stars 2 forks source link

Package FireSim Rocket-based AFI + SW #86

Closed dhand-galois closed 4 years ago

dhand-galois commented 4 years ago

Build a GFE-mapped Rocket AFI and necessary host-side software for FETT team to quickly run tests on an F1 instance without using FireSim

dhand-galois commented 4 years ago

Needs both Linux and FreeBSD images.

dhand-galois commented 4 years ago

First version is up and running. Will add FreeBSD soon, but that OS is still taking >15 mins to boot.

https://gist.github.com/dhand-galois/9c41af3c10cb9cea2daf2ae1c9e2deed

dhand-galois commented 4 years ago

Continually updating the same gist document linked above. Now running FreeBSD and Linux at a fairly reasonable spin. A few people have reproduced this, so going to close out this ticket as complete.

kiniry commented 4 years ago

The gist document will eventually land as a real document in the docs repo and here, correct?

dhand-galois commented 4 years ago

If that's where it makes sense to put it, I can certainly check it in to those places. But this is not a long-term solution for a deployment.

I assume someone on the FETT side of all this is working on integrating it into their flows.

kiniry commented 4 years ago

Yes, that's @EthanJamesLew working with @rtadros-Galois.

rtadros125 commented 4 years ago

I am confused again and need someone to connect the dots. We said on MM that there are 3 platform variants: Firesim, Connectal, and AWSteria. And that we're starting with Firesim that requires two instances.

Here's what Joe has said: CloudGFE has three variants: Connectal (MIT and perhaps SRI/Cambridge), AWSteria (SRI/Cambridge), and FireSim (LMCO and UMich). All three variants will be spun up with a combination of an AMI+AFI for use by Target. Current FireSim requires two instances (manager+host+F1); AWSteria and Connectal require only one instance (host+F1). As I understand it, Dylan Hand's work is aiming to be able to do away with the FireSim manager eventually, but I do not feel like he has guaranteed that is possible/reasonable, so for the moment with Target development we have to assume two instances for FireSim.

The description in this ticket says without firesim. Will this firesim-free flow be supported by any of the TA-1 teams? Do we need to consider this as a 4th variant that we need to support during the bug bounty? Or does this 4th variant replace the firesim variant? Is this ticket related to Joe's last sentence about the not-guaranteed flow?

rtadros125 commented 4 years ago

If Dylan meant to say "without using firesim to build the AFI", then why does it have to be on f1.2xlarge? @EthanJamesLew uses a c5.4xlarge instance and it runs fine.

dhand-galois commented 4 years ago

The description in this ticket says without firesim. Will this firesim-free flow be supported by any of the TA-1 teams?

I'm not really sure what this is asking. The goal of this ticket was to produce a distributable package (i.e. tgz file) that can be extracted onto an F1 instance and quickly run a pre-made Rocket P2 AFI w/ sample software (Linux, FreeBSD, etc).

Whether or not TA1 teams want to distribute their final versions in the same way is a separate concern. If we're responsible for spinning up the F1 instances, then I suppose that is up to us more than it is up to them.

Do we need to consider this as a 4th variant that we need to support during the bug bounty? Or does this 4th variant replace the firesim variant?

I would say it is hopefully an "either - or". Either we ride on top of the normal FireSim manager+cluster model or we try to do something similar to what I've done here where the AFI can be loaded and used without FireSim (the FireSim repo nor a FireSim manager). FireSim is always required to build the AFI as it needs to convert Chisel to Verilog and generate the collateral for Vivado.

Is this ticket related to Joe's last sentence about the not-guaranteed flow?

This ticket was about the request to have Linux running on an AFI ASAP. Part of ASAP to me seemed to imply "while avoiding learning how to use firesim for multiple days", so I created these minimal_cloudgfe.tgz packages that can run standalone.

If Dylan meant to say "without using firesim to build the AFI", then why does it have to be on f1.2xlarge? @EthanJamesLew uses a c5.4xlarge instance and it runs fine.

I don't know what this is in reference to, but to run an AFI you need to use a f1.2xlarge. Are you talking about something else?

EthanJamesLew commented 4 years ago

@dhand-galois I use a c5.4xlarge as the firesim manager node. The run farm is f1.2xlarge.

rtadros125 commented 4 years ago

Now I understand. The MM convo was so confusing to me. Let me summarize one last time to ensure we ARE on the same page:

There are four platform variants of doing this: V1. Using firesim manager on an instance that spins an F1 instance. So the 2-instance approach, and this is the way FireSim manager is supposed to be used. V2. Without using a FireSim manager. This is what you describe in this ticket. V3. AWSteria. V4. Connectal.

My questions for @dhand-galois are: QD1. Are there any differences between V1 and V2 (other than the obvious 2-instances and 1-instance)? Like restrictions on drivers, OS builds, AFIs build, etc. QD2. How hacky is V2? In other words, if I am completely able to support V1, would you still recommend doing V2?

My questions for @kiniry are: QJ1. When you mentioned that LMCO and UMich are using FireSim, did you mean exclusively V1, or either V1 or V2 they don't care, or they are not sure yet? QJ2. My action plan is to choose a platform variant, and start working on it until the entire flow is working. Then start to integrate the other ones. Which one should we pick? This depends of course on the answers to the previous questions, I am just laying down my thoughts explicitly.

dhand-galois commented 4 years ago

There are four platform variants of doing this:

I'd still say we should pick V1 or V2, but for the purposes of this discussion - yes, there are 4 possible ways to load and run AFIs right now.

QD1. Are there any differences between V1 and V2 (other than the obvious 2-instances and 1-instance)? Like restrictions on drivers, OS builds, AFIs build, etc.

I feel like we should separate "building" the AFI from "running" it when talking about FireSim, as it complicates the discussion.

For running AFIs, V1 requires a very specific (imo complex) instance setup as FireSim has lots of dependencies to run their python scripts and the chisel build flow. It is mostly well scripted, but just getting it setup once takes a few hours. I'm not sure if it can easily be repackaged into a new AMI afterwards, as it is derived from the FPGA Developer AMI.

V2 is a significantly lighter lift to spin up F1 instances - it's 2 pre-compiled kernel modules, 2 pre-compiled x86-64 binaries (<10MB), some bash scripts, and then whatever block image + ELF you want to load. We could even make it more AMI agnostic by building the kernel modules from source. The downside is it may require some more changes to firesim to create these "setup"/support packages with the 2 kernel modules and 2 x86-64 binaries. Currently, the firesim manager dynamically creates these during the infrasetup task and rsyncs them to all the F1 instances. We just need to produce these when building the AFI and package them into a tgz - possibly put them up in S3 somewhere. Couple hours of python work.

QD2. How hacky is V2? In other words, if I am completely able to support V1, would you still recommend doing V2?

IMO, figuring out how to fully script firesim, which itself is already a fairly complex collection of scripts, is more hacky than the solution I'm envisioning for V2. In short form, V2 is "use firesim locally to build an AFI + support package. Copy support package to an F1 instance of your choosing and run it". V1 is "Spin up a FireSim manager instance, set it up, make sure the active chipyard commit matches the one previously used to build the AFI you want to run, launch run farm, infrasetup, runworkload" to get to the same point.

EthanJamesLew commented 4 years ago

We just need to produce these when building the AFI and package them into a tgz - possibly put them up in S3 somewhere.

Something like this is what I have in mind when planning the FETT Target interaction. Use launchrunfarm and part of infrasetup, and have the target project do the rest. This separation makes sense with respect to approaches that we've taken in the past. We do most build related tasks in an environment project (like a nix shell) and then have our apps use it.

@dhand-galois @rtadros-Galois
Are the binaries reusable to manage them in a way that we're planning to for FETT Environment?

dhand-galois commented 4 years ago

Something like this is what I have in mind when planning the FETT Target interaction. Use launchrunfarm and part of infrasetup, and have the target project do the rest. This separation makes sense with respect to approaches that we've taken in the past. We do most build related tasks in an environment project (like a nix shell) and then have our apps use it.

I'm suggesting using firesim only for buildafi - everything else we could handle ourselves, as you'll have to write the scripting to launch F1 instances for the other two approaches anyway. It seems like less work to me and a more consistent experience.

Are the binaries reusable to manage them in a way that we're planning to for FETT Environment?

I don't know all the context behind that question, so I can't answer fully. One thing to note (I mentioned this on MM), the x86-64 host side binaries FireSim produces are only reusable across AFIs with exactly the same interface. Right now it varies slightly between the three processors, but I am working on standardizing the interface and think I can get there. The main complication would be if we need to later change that interface for one or more of the processors, then we need to accommodate multiple versions of the host-side software. I imagine this is the same case for AWSteria and Connectal, but haven't looked.

By interface I mean the number and type of devices on the FPGA that the host needs to communicate with, such as UARTs, Ethernet, Block, Tracers, Debug, etc.