Connectal GFE AWS release 1.0 - bring up FreeBSD built with LLVM

kiniry commented 4 years ago

Is this happening this week? If so, please add Sprint #2 to the Milestone. I'll only mention this on this particularly issue @charlie-bluespec. As you can see, I'm labeling all of these recent label-less issues. Please try to label issues when they get filed, as that helps me and others who are managing/tracking issues across all projects keep up with matters.

charlie-bluespec commented 4 years ago

@joestoy, do we need another FreeBSD image from Cambridge with all the virtio drivers, or does the image you have now contain them all.

rwatson commented 4 years ago

@jrtc27 @bsdjhb @brooksdavis

I believe that the image you have from @jrtc27 likely has exactly what is needed .. but tagging to confirm.

joestoy commented 4 years ago

@jrtc27 referred me to the following

https://www.cl.cam.ac.uk/~jrtc4/cheribsd-minimal-riscv64.img and https://www.cl.cam.ac.uk/~jrtc4/cheribsd-riscv64.img are the rootfs images I've been using

https://www.cl.cam.ac.uk/~jrtc4/bbl-riscv64.FETT has a working virtio kernel
https://www.cl.cam.ac.uk/~jrtc4/bbl-riscv64.GFE is a working mfsroot kernel, but I don't think I updated it with virtio fixes

Thus far I've been using the GFE one; I guess the FETT one is the image @rwatson is referring to above. Does that satisfy the "built with LLVM" criterion in the title of this issue? I'm not sure how the two .img files fit into the picture. I haven't used them yet -- please advise.

jrtc27 commented 4 years ago

The .img files are intended to be used with the FETT kernel. The GFE kernel has its root filesystem embedded in it, but the FETT kernel is configured to boot from a VirtIO block device, which is provided by the ssith_aws_fpga tool if you provide -B cheribsd-riscv64.img or -B cheribsd-minimal-riscv64.img. Both disk images work, but the latter is a minimal busybox-like image, whereas the former is a full normal image.

joestoy commented 4 years ago

Many thanks. I'll be trying the FETT kernel as soon as today's synthesis terminates, so that's very useful.

kiniry commented 4 years ago

CC to @podhrmic and @immindich as they are tracking FreeBSD-related matters here at Galois (at least, as I understood it was going to be based upon my last conversation with @rfoot).

joestoy commented 4 years ago

I have just booted the FETT kernel with the full image file, and got as far as the login prompt. I can't get any further until someone gives me a username and password on the system I have instantiated on the sytem I have instantiated. That was with the standard Flute (and, at the time of synthesis, the latest ssish-aws-fpga). I try the CHERI-Flute next.

jrtc27 commented 4 years ago

Username root with no password should work.

joestoy commented 4 years ago

Thanks -- that's for tomorrow morning (and you're working very late!).

kiniry commented 4 years ago

CC @dhand-galois so he can observe progress. The above kernel is as we discussed earlier today with the minimal system image.

rwatson commented 4 years ago

(Just so that it isn’t lost amidst so much stuff: This kernel supports VirtIO as long as DMA is coherent, so it works with debug-unit-injected VirtIO, but will not work with direct DMA to FPGA DRAM unless the coherency issue is addressed. We believe that the primary impact of the current model will be a substantial performance impact, but we’ve not attempted to characterise that. My concern is that it probably pushes us over the threshold into “unusably low”, but if not, that could be a viable minimum configuration. If it’s too slow, we need to have sorted out DMA + the coherency issue before we reach viability.)

rwatson commented 4 years ago

With regard to the LLVM question: Yes, all of our userspace + kernels are built only using LLVM, and so the LLVM threshold is met if we think we are at a viable FreeBSD.

I think ideally the “is it working” threshold would include validating block storage (sounds like @joestoy has achieved this, although logging in and running some stuff definitely good), but also networking. I think I’ve not yet seen any notes indicating that we think that is OK yet. Logging in a first good step towards confirming that.

jrtc27 commented 4 years ago

For VirtIO networking, the non-legacy device gained a field that was previously optionally negotiated for the legacy device, and FreeBSD does not think it’s there. @bukinr (who doesn’t appear to be taggable, I guess he’s not a member of this organisation?) observed this as every packet being corrupted in TinyEMU, and worked around it by deleting the field in TinyEMU. That is, however, an incorrect fix that makes TinyEMU strictly non-spec-conforming, and the correct fix would be to implement non-legacy support in the network driver in FreeBSD. I think it would also be useful for him to go through the spec and update all our drivers in general, since I worry there may be other things like this lurking, but the block device, entropy and console (not that we intend to use it) drivers do seem to work so maybe that’s unfounded...

joestoy commented 4 years ago

Trying a FreeBSD boot again, on the standard Flute with teh full image file. I notice that it's doing a full fsck and finding errors:

CONSOLE: No suitable dump device was found.
CONSOLE: Starting file system checks:
CONSOLE: /dev/vtbd0: CYLINDER GROUP 0: BAD MAGIC NUMBER
CONSOLE: /dev/vtbd0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
CONSOLE: File system preen failed, trying fsck -y -T ffs:-R -T ufs:-R
CONSOLE: ** /dev/vtbd0
CONSOLE: ** Last Mounted on /
CONSOLE: ** Root file system
CONSOLE: ** Phase 1 - Check Blocks and Sizes
CONSOLE: CYLINDER GROUP 0: BAD MAGIC NUMBER
CONSOLE: REBUILD CYLINDER GROUP? yes
CONSOLE: 
CONSOLE: random: unblocking device.
CONSOLE: UNKNOWN FILE TYPE I=6656
CONSOLE: CLEAR? yes
CONSOLE: 
CONSOLE: UNKNOWN FILE TYPE I=7680
CONSOLE: CLEAR? yes
CONSOLE: 
CONSOLE: CYLINDER GROUP 1: BAD MAGIC NUMBER
CONSOLE: REBUILD CYLINDER GROUP? yes
CONSOLE:

is that expected?

jrtc27 commented 4 years ago

It depends how you shut down. If you Ctrl-C to abruptly terminate the host side of VirtIO and then reflash the FPGA, then yes, the disk got yanked out from under the OS, the OS got killed and it had no chance to cleanly flush its buffers and unmount the disk (this is entirely equivalent to cutting the power to your local machine as it's running). This doesn't happen with a kernel that uses an mfsroot, since by loading the kernel on you also load the original root filesystem and have no persistent state.

You should use poweroff (shutdown -h now etc also work, as does halt on a non-minimal image) and wait until FreeBSD tells you:

The operating system has halted.
Please press any key to reboot.

at which point you should be able to safely kill the ssith_aws_fpga process.

joestoy commented 4 years ago

OK. This was before I knew the username/password (or, probably, that I'd failed to retrieve from the recesses of my memory how Linuxes behave out of the box). Was there a way I could have restarted with the filestore image you gave me? I did give the same -B flag again -- had the previous run altered that file?

jrtc27 commented 4 years ago

Yes, -B will use that file as the live image, so any writes from the guest will modify that. If you want to restore the image, download it again (and keep a local copy that's always clean).

joestoy commented 4 years ago

Great! I've now booted FreeBSD (again with standard Flute), logged in as root, and run a few commands. I assume that since it's manipulating a live fliestore image in a file on the AWS filestore, the virtio block device is working. And ifconfig says

CONSOLE: root@qemu-cheri-jrtc4:~ # ifconfig
ifconfig
CONSOLE: vtnet0: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
CONSOLE:         options=28<VLAN_MTU,JUMBO_MTU>
CONSOLE:         ether 02:00:00:00:00:00
CONSOLE:         inet 0.0.0.0 netmask 0xff000000 broadcast 255.255.255.255
CONSOLE:         media: Ethernet 10Gbase-T <full-duplex>
CONSOLE:         status: active
CONSOLE:         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
CONSOLE: lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
CONSOLE:         options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
CONSOLE:         inet6 ::1 prefixlen 128
CONSOLE:         inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
CONSOLE:         inet 127.0.0.1 netmask 0xff000000
CONSOLE:         groups: lo
CONSOLE:         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
CONSOLE: root@qemu-cheri-jrtc4:~ #

(I also noticed that during the boot is was trying to find a DHCP server, not surprisingly unsuccessfully.) So does that imply that virtio net device is also working? (I'm using the UART console.)

joestoy commented 4 years ago

It is, though, very slow. Is that because Jamey's current implementation of virtio transfers stops and restarts the CPU each time?

jrtc27 commented 4 years ago

Yes, it is slow because we stop and start the CPU and go via the debug module for every memory access the host wants to perform (and without even any batching). This is why we need a faster solution that's still coherent with the L1.

jrtc27 commented 4 years ago

Re vtnet0 not working, see https://github.com/DARPA-SSITH-Demonstrators/BESSPIN-CloudGFE/issues/50#issuecomment-625758743, I assume this is at least partially responsible.

jameyhicks commented 4 years ago

If we add a direct path to DRAM and wrap transfers in halt/resume, it will be much faster and continue to be coherent with the RISC-V processor.

jrtc27 commented 4 years ago

It will, however, completely bypass our tag controller, and I'm not sure you want to halt the core the entire time you're doing VirtIO as that will be a significant fraction of time, and if you do it on a per-request basis that will add a lot of halt/resume overhead given the sheer number of read{16,32}/write{16,32} made by TinyEMU when parsing the queue descriptors. It will be faster but I don't think it will be as fast as one might reasonably expect.

rwatson commented 4 years ago

The above fsck messages leave me slightly worried -- normally, with the filesystem in journaled, soft updates, or synchronous mode, you should only get a specific subset of possible filesystem corruption types, as stores are ordered -- e.g., unlinked files that need GC’ing, etc, if running without journaling. The above errors don’t look like they should be in that set.

That leaves me worried that there is some other source of corruption and/or disordering arising. It might be useful to take a clean image, boot up, run for a bit, shutdown and reboot to single user, and then fsck the filesystem just to make sure there’s not been unnoticed corruption. It makes me wonder if ordering is not being properly preserved end-to-end, or whether there’s some other source of data corruption on the path across VirtIO/etc.

(I wouldn’t preclude some filesystem race conditions, etc., manifesting in a slightly unusual environment, but .. the above is a bit worrying.)

jrtc27 commented 4 years ago

That happens on QEMU every time I forget to shut down cleanly, MIPS or RISC-V. It's UFS, you only get journaling via GEOM, which we don't do for our disk images.

(It's also rare that I boot a disk image multiple times, as normally it's either working and I'm done or I need to fix something and rebuild, so I develop bad habits...)

rwatson commented 4 years ago

UFS supports native journalling of metadata without using GEOM, if properly configured. It should now be the default in installs, although possibly the way we are configuring images without it for some reason.

rwatson commented 4 years ago

(But default synchronous operation should still not trigger that sort of thing. I wonder if this actually means there is a bug in FreeBSD’s VirtIO block parts. Maybe @bsdjhb will feel moved to opine?)

jrtc27 commented 4 years ago

QEMU MIPS isn't using VirtIO, but RISC-V is. I see softupdates isn't new, but most documentation is stale and fails to acknowledge it. makefs also unhelpfully has it default to false (much like newfs). We should tell cheribuild to set it to true.

rwatson commented 4 years ago

It will, however, completely bypass our tag controller, and I'm not sure you want to halt the core the entire time you're doing VirtIO as that will be a significant fraction of time, and if you do it on a per-request basis that will add a lot of halt/resume overhead given the sheer number of read{16,32}/write{16,32} made by TinyEMU when parsing the queue descriptors. It will be faster but I don't think it will be as fast as one might reasonably expect.

It seems like the only sensible hack short of DMA entering via a coherent path (e.g., another port on a shared L2 cache, much like another processor would, or at least with automatic invalidation) is having DMA enter via the same L1 the processor itself is using -- as another master on the L1. I don’t take a view as to whether this is a good idea, but .. it is appealingly simple sounding.

jrtc27 commented 4 years ago

That has been my thinking, putting an arbiter in front of the L1 that provides two instances of the interface (or, one instance and one cut-down instance), and ideally one that has a fast enough bypass path so that it doesn't add any latency to the core unless necessarily when under contention.

rwatson commented 4 years ago

This seems sensible to me, as long as the plumbing doesn’t leave everyone and everything unhappy. We’d get the same coherence properties as today, and hopefully vastly better performance. DMA would starve the main CPU a bit when blasting away .. but still much better than manual software copying to and from an uncached region of dedicated memory.

GaloisInc / BESSPIN-CloudGFE

Connectal GFE AWS release 1.0 - bring up FreeBSD built with LLVM #50