determine Sprint 5 FireSim Tasks

dhand-galois commented 4 years ago

FireSim is up and running - major remaining work is on the driver side to support FreeRTOS and (possibly) FreeBSD.

Looking forward, there are a number of tasks that could be done but have unclear benefit or importance to the program. I'd like feedback from others on what they view as worthwhile to tackle and in what priority. I've tried to make some initial assessment of the difficulty as well.

[x] Build up to date versions of Chisel P1, Chisel P2, and Bluespec P2 AFIs (In Progress)
[x] Add building host-side software to buildlocalafi flow in FireSim (Planned)
[ ] Make buildlocalafi work when Vivado is installed on the host OS, not in docker (Planned)
[x] Add basic tracing support for all processors (In Progress)
[x] #108 Add debug / gdb support (Medium - High)
[x] #109 Add random number interface (Medium - High)
~~Alternatively, add a hardware random number generator (High)~~
~~Integrate Galois HSM - AES, SHA, HMAC (HW: Low, SW: Medium for OS-level support)~~
[ ] #79 Backport changes to classic GFE (Low - Medium)
~~Set up automated FireSim AFI generation for LMCO (and others)~~

Any others I missed? Feedback?

rfoot commented 4 years ago

just got a call from Keith visa-vi dates for freeRTOS availability - DDS/Synack is concerned that UofM won't make their submission dates - I committed us to 6/1 for this work. Let me know if that isn't reasonable

dhand-galois commented 4 years ago

How much of FreeRTOS needs to be working for them to get started? @podhrmic already has the UART interface live. I can try to help with the network and block device driver development if that is required for June 1st

austinharris commented 4 years ago

We are hoping to deliver the full demo app with the web server by June 1st. We can do without the block device using just an in memory database.

dhand-galois commented 4 years ago

Got it, so networking is top priority. I think we can make that work.

rfoot commented 4 years ago

awesome! I'll let Keith know.

dhand-galois commented 4 years ago

Three of these tasks are now complete (or will be once #97 is merged). More details on running on-premise FireSim will be in #89.

dhand-galois commented 4 years ago

Seeing as Sprint 4 is mostly over and we have Sprint 5 planning on Monday, I updated this to be a Sprint 5 topic.

dhand-galois commented 4 years ago

Adding a task for building the automated flow to generate AFIs

kiniry commented 4 years ago

Are any of these remaining open tasks necessary for FETT launch?

podhrmic commented 4 years ago

I am not sure how much we need a RNG - @rtadros-Galois ? Otherwise I think the remaining tasks are nice-to-have

dhand-galois commented 4 years ago

The CI-based build flow (for at least LMCO, although hopefully useful for others) needs a performer. I can help guide that process but wrangling CI runners is not my specialty.

Out of the others on the list, random number sources seems like the only one we need a definitive "yes/no" answer on. Possibly also debug support, although the TA1 teams using FireSim-based approaches have not emphasized that as a necessity to date. The rest (HSM and back porting) can be pushed if there's no need.

rtadros125 commented 4 years ago

I am not sure how much we need a RNG - @rtadros-Galois ? Otherwise I think the remaining tasks are nice-to-have

Here's the situation: Both Debian and FreeBSD acts weird on qemu because of the entropy situation. We have a webserver hosting a webpage, and the voting thing. The webserver accepts HTTPS requests, and to process these, sometimes the lack of entropy gets in the way. On FPGA we see it less, but we see it. On AWS, I don't think we saw it, but we haven't used it enough. My only worry is: I don't want to receive bug reports that the webserver is not responding unless we send keyboard strokes through the SSH connection.

jrtc27 commented 4 years ago

FreeBSD's /dev/random is non-blocking, unlike Linux's, but you probably want to use /dev/urandom instead if that's your entropy source
getrandom/getentropy are non-blocking on both OSes
QEMU can provide a HWRNG via -device virtio-rng-pci

Unless you haven't yet sufficiently seeded your PRNG since boot, FreeBSD should never "act weird". So long as you don't use /dev/random, Linux also shouldn't do so.

jrtc27 commented 4 years ago

Also, on FreeBSD, if you're on QEMU/Connectal/AWSteria and thus want to make use of the VirtIO entropy device, you will need the driver. If your rootfs has kernel modules then it should be picked up from there, but otherwise you will need device virtio_random in your kernel config to compile it into the kernel itself (by default the only VirtIO drivers compiled into the kernel are the block and network drivers due to their importance in very early boot, the rest are left as modules to dynamically load).

rtadros125 commented 4 years ago

Unless you haven't yet sufficiently seeded your PRNG since boot, FreeBSD should never "act weird".

How to: 1. do that. 2. Guarantee that (1) was done.

jrtc27 commented 4 years ago

You can query the state of the random device via sysctl kern.random (or any of the subnodes); see https://www.freebsd.org/cgi/man.cgi?query=random&sektion=4. On platforms with no HWRNG, you can seed it these days by writing to /dev/random, as on Linux.

rtadros125 commented 4 years ago

@waylon531 See the comment above. I am assuming you were doing smth similar. Waiting for your PR ;). Thanks.

austinhroach commented 4 years ago

Todd Austin had expressed to DARPA that GDB support for the Michigan FireSim P1 would be helpful, though that may not have been relayed to Galois? They're apparently currently debugging with printf, so if real debug support were available before they're done debugging, I think that they would appreciate it. Might be worth verifying with the UoM folks.

rtadros125 commented 4 years ago

@austinhroach A better alternative to printf is the use of Firesim trace. The output is basically which instruction was being executed at each cycle. This produces huuuuuge amounts of data. However, if you have the *.asm file of the binary, using grep and maybe split if you want to read a certain part, you can find your way into which instruction had trapped, or at which kernel function the program got lost.

austinhroach commented 4 years ago

@rtadros-Galois Good suggestion, thanks. I relayed your suggestion to Todd Austin, and suggested that Michigan get in contact with their P1 counterparts at Galois if full-featured debug support is a need of theirs for the foreseeable future.

austinharris commented 4 years ago

We have been using the trace, but it is still much more difficult to debug than having gdb support since we can’t peer into register values, etc. GDB support should be a high priority in my opinion.

kiniry commented 4 years ago

What are the challenges inherent in getting gdb operational with FireSim, @dhand-galois?

dhand-galois commented 4 years ago

The bulk of the work is in adding a new 'bridge' for FireSim that will shuttle the necessary data between the host and FPGA. I've looked at how the existing bridges work and generally understand them, but have not yet implemented one myself - so there's an unknown aspect to that. One other possible complication is if the simulator cannot be used for some reason, the debug cycle of building AFIs and testing on AWS becomes the long pole.

On the software side, we should be able to leverage existing code to get something running. I am expecting we could make use of the existing simulator hooks to connect OpenOCD, such as sim_dmi.c used in the GFE verilator simulator, or the more widely used remote_bitbang JTAG interface. Bluespec also has their gdbstub approach that bypasses OpenOCD, but does not currently support the debug module used in our chisel processors.

I can take a shot at writing the bridge module and bringing up one of the software interfaces with an ETA somewhere in the mid-to-late next week range, at least for an initial version. Is that too long of a delay to be useful?

austinharris commented 4 years ago

I think it would still be useful.

kiniry commented 4 years ago

Given recent work on the AWSteria and Connectal platform variants, we may be able to expedite this. See https://github.com/DARPA-SSITH-Demonstrators/BESSPIN-CloudGFE/issues/49

@dhand-galois and @podhrmic, if we do put a performer on leveraging that work, who is the best person that could tackle it? CC @rfoot @Abivin12

podhrmic commented 4 years ago

I can assist with testing or anything sw related, but the HW design/Firesim implementation is not in my wheelhouse, so I will defer to @dhand-galois 's recommendation.

dhand-galois commented 4 years ago

I have (seemingly) full gdb support working with FireSim as of today. I say seemingly because I've tested some functionality, but certainly not all. I've tested loading, breakpoints, memory inspection, etc.

The main outstanding issue is the transfer rate is capped at 30KB/sec, which may be good enough for now.

I haven't looked closely at the gdbstub work, but my understanding is it does not support Rocket's debug module, so that would be up to Bluespec to determine effort/performers.

kiniry commented 4 years ago

Well, that was quick @dhand-galois. ;) How do you want others to test out what you have?

jrtc27 commented 4 years ago

Yes, gdbstub currently relies on system bus etc support and doesn't have an alternative program buffer-based implementation. It's probably not particularly difficult work to add, but a chunk of busy work. Does the transfer rate matter, though; surely you can just load binaries directly in FireSim, and GDB only needs to be good enough to debug interactively (for which 30 KiB/sec is more than adequate)?

dhand-galois commented 4 years ago

@jrtc27 Yep, that's exactly what I was thinking. FireSim can load large binaries in a second or two, so I have that completing just before starting to process gdb/openocd connections. The memory is preloaded by the time gdb connects to the core. We can leave performance improvements as a 'backlog / low priority' task. Interactive use is very responsive.

dhand-galois commented 4 years ago

Tracking progress on debug support in #108. And a simple RNG device in #109.

GaloisInc / BESSPIN-CloudGFE

determine Sprint 5 FireSim Tasks #94