GaloisInc / BESSPIN-CloudGFE

The AWS cloud deployment of the BESSPIN GFE platform.
Apache License 2.0
2 stars 2 forks source link

AWSteria load ELF files into DRAM #115

Closed charlie-bluespec closed 4 years ago

charlie-bluespec commented 4 years ago

Plug in host-side C code to load an ELF instead of a Mem.hex.

kiniry commented 4 years ago

Hasn't this been added to AWSteria, @joestoy and @rsnikhil?

rsnikhil commented 4 years ago

Not yet, but it's a very small bit of work (an hour?) on the todo list. We're loading from hex files at the moment, but we have all the C code available loading ELF files. It's an entirely host-side, C code issue.

joestoy commented 4 years ago

Actually yes; it was done by me this afternoon. Commit d05ad37 on branch twoclock_vanilla. Not sure whether it needs to reach develop before it can be declared done, and what other merges should happen on its way there.

joestoy commented 4 years ago

(That's both in verilator simulation, and for the AWS, where it's booted FreeBSD.)

jrtc27 commented 4 years ago

Your ELF "loader" trips up on the various pitfalls I mentioned elsewhere. Namely, you look at sections not segments and so break when loading a binary with virtual addresses that differ from physical addresses. You also as a result ignore NOBITS sections rather than zero-initialising them. Both "work" for BBL because GDB also does this and I added an upstream workaround, but break when loading a FreeBSD kernel directly that's not embedded inside BBL, as we are doing due to the far easier (and less error-prone) development workflow. Please see Connectal's for how it's meant to be done, as well as https://github.com/DARPA-SSITH-Demonstrators/BESSPIN-CloudGFE/issues/4#issuecomment-641502517 which gives the three points that I feared would not be done properly (and indeed turned out to be valid).

jrtc27 commented 4 years ago

There's a also a fourth issue I didn't anticipate: you hard-code "_start" as the entry point symbol. This is not specified to be true, and you must instead look at the entry point specified in the ELF header. This gives a virtual address which must then be translated to a physical address by finding the segment within which it resides and adding the segment's p_paddr - p_vaddr to the virtual address (strictly should be p_paddr + (entry_point - p_vaddr) in the world of provenance, but everything here is an address so you can get away with entry_point + (p_paddr - p_vaddr)).

jrtc27 commented 4 years ago

And there is no need to allocate a 4G buffer, memcpy into it and then DMA that to the host. Just DMA the segments directly.

kiniry commented 4 years ago

Thanks for all of that feedback, @jrtc27 !

joestoy commented 4 years ago

Commit d05ad37 on branch twoclock_vanilla is withdrawn. Pace @rsnikhi, it's clearly not a one-hour job.

rsnikhil commented 4 years ago
   it's clearly not a one-hour job.

Agreed, to accommodate all the fixes that @jrtc27 raises (about which, thanks for the elucidation). My "one hour" assessment was for our existing code, which merely substitutes the (admittedly limited functionality) ELF reader for the mem-hex reader (the DMA into memory is the same). That code has existed for years, and we've incorporated it a number of times into a number of systems and used it successfully (perhaps we just got lucky that we didn't run into the issues raised by Jessica).

rsnikhil commented 4 years ago

When @jcrt27 says "Your ELF "loader" trips up..." I am not sure which ELF loader she is referring to. The one I have in mind is the one in our DSharp debugger, which is not in a public repo, and it already contains many of the suggested fixes (iterate over sections, don't ignore NOBITS sections, ...).

jrtc27 commented 4 years ago

It "works" because you don't notice for bare-metal code, which includes BBL embedding a kernel, and because boot loaders after often loaded by shoddy firmware that don't zero out BSS so have to do it manually. As soon as you want to split the two apart (or even load a bare-metal binary that does make use of virtual addressing), or use something that expects an actual standards-compliant loader, it breaks. The silly thing is that it's actually easier to use the program headers rather than the section headers to load a file; they're a deliberately-simplified form that coalesce sections together and give you the bare minimum needed to load a file (where in the file the data is, where in memory to put the data (physically and virtually), how much data to copy and how much to zero out, that's basically it, no need to look at complex sets of flags), to the extent that you can easily write a loader in assembly.

rsnikhil commented 4 years ago

Thanks for all the technical points, which I will address when I get around to this task. There seems to have been some misunderstanding about who's going to address this task, and when. My impression was: it's on my plate; I haven't gotten to it yet (will do after virtio task); the Mem.hex loader there is a mere temporary placeholder (including the 4GB buffer allocation, which was a temporary hack while first learning about AWS DMA). It seems that, without my noticing, it got assigned to @joestoy who was not aware of my plans, etc. etc. etc.

joestoy commented 4 years ago

Closing this issue: current state is at f7206c3 (origin/twoclock_vanilla). This work could be carried over to CHERI Flute; but it's probably better to wait until @rsnikhil's virtio work is merged, as that will significantly change the host side too. See also #113.