WebServer unikernel shuts down after "Starting server on device ..."

runeksvendsen commented 8 years ago

So, I finally managed to get everything working wrt. permissions for ec2-unikernel, and I have an AMI for the WebServer example, but when I boot this AMI in an EC2 instance, it shuts down by itself, leaving the following in the log:

Xen Minimal OS!
  start_info: 0xae2000(VA)
    nr_pages: 0x26700
  shared_inf: 0x7df0b000(MA)
     pt_base: 0xae5000(VA)
nr_pt_frames: 0x9
    mfn_list: 0x9ae000(VA)
   mod_start: 0x0(VA)
     mod_len: 0
       flags: 0x0
    cmd_line: root=/dev/sda1 ro 4
  stack:      0x96d840-0x98d840
MM: Init
      _text: 0x0(VA)
     _etext: 0x7dc7d(VA)
   _erodata: 0x9a000(VA)
     _edata: 0x9fce0(VA)
stack start: 0x96d840(VA)
       _end: 0x9ade40(VA)
  start_pfn: af1
    max_pfn: 26700
Mapping memory range 0xc00000 - 0x26700000
setting 0x0-0x9a000 readonly
skipped 0x1000
MM: Initialise page allocator for c1f000(c1f000)-26700000(26700000)
MM: done
Demand map pfns at 26701000-2026701000.
Heap resides at 2026702000-4026702000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0x26701000.
Initialising scheduler
Thread "Idle": pointer: 0x2026702050, stack: 0x26660000
Thread "xenstore": pointer: 0x2026702800, stack: 0x26670000
xenbus initialised on irq 1 mfn 0x80b737
Thread "shutdown": pointer: 0x2026702fb0, stack: 0x26680000
Dummy main: start_info=0x98d940
Thread "main": pointer: 0x2026703760, stack: 0x26690000
"main" "root=/dev/sda1" "ro" "4" 
vbd 2049 is hd0
******************* BLKFRONT for device/vbd/2049 **********

backend at /local/domain/0/backend/vbd/11407/2049
Failed to read /local/domain/0/backend/vbd/11407/2049/feature-barrier.
Failed to read /local/domain/0/backend/vbd/11407/2049/feature-flush-cache.
2097152 sectors of 512 bytes
**************************
[H[J

    GNU GRUB  version 0.97  (629760K lower / 0K upper memory)

[ ... garbage output removed ... ]

    Use the ^ and v keys to select which entry is highlighted.

    Press enter to boot the selected OS, 'e' to edit the

    commands before booting, or 'c' for a command-line.[5;78H [m[7m[5;3H unikernel_boot                                                          

[ ... garbage output removed ... ]

    The highlighted entry will be booted automatically in 1 seconds.   [5;75H[H[J  
Booting 'unikernel_boot'

root (hd0,0)

 Filesystem type is ext2fs, partition type 0x83

kernel /WebServer

============= Init TPM Front ================
Tpmfront:Error Unable to read device/vtpm/0/backend-id during tpmfront initialization! error = ENOENT
Tpmfront:Info Shutting down tpmfront
close blk: backend=/local/domain/0/backend/vbd/11407/2049 node=device/vbd/2049
Found 1 NIC
  22:00:0A:E7:96:CB
Starting server on device 22:00:0A:E7:96:CB

The unikernel was built using the fedora23 Vagrant environment. The AMI image name in EC2 is called ami-7726ed17, and I have made it public.

acw commented 8 years ago

Hmmm! Can you tell me which version of the HaLVM was installed in the F23 environment?

runeksvendsen commented 8 years ago

I'm not used to Fedora, but as I recall it HaLVM was already installed in /usr/bin (halvm-ghc, halvm-cabal, etc.) when it booted up. As far as I can see, these binaries are provided by the HaLVM package, which dnf says is at version 2.1.0.

Side note: I built the WebServer example using this guide: https://github.com/GaloisInc/HaLVM/wiki/HaLVM-Web-Server-Quick-Start. So HaNS and network-hans were installed from Git master, and your fork of the HTTP package was used (https://github.com/acw/HTTP). For the WebServer example I cloned git master of HaLVM, configured it, and just ran make inside the examples/HighLevel/WebServer directory.

runeksvendsen commented 8 years ago

Actually, looking through my bash history, it appears that I installed HaNS and network-hans using cabal (from Hackage), if that makes a difference.

runeksvendsen commented 8 years ago

In trying to figure out this issue, I've discovered two things, without actually making it work:

The "Failed to read" errors disappear when I use an m3 instance rather than a t1 instance
The "TPM Front" errors are seemingly unrelated to this issue, as they appear in many logs from images that actually boot

I still haven't figured out why it the kernel exits after finding a network device, but before getting an IP address via DHCP. Any clues on how to debug this would be greatly appreciated.

abailly commented 8 years ago

Hi Rune, Not sure that helps but when I tried to do the same, I had to assign a fixed IP to make webserver boot. However my setting was different: I ran unikernels on a local Xen VM on my laptop.

HTH Arnaud

Le 31 août 2016 07:31, "Rune K. Svendsen" notifications@github.com a écrit :

In trying to figure out this issue, I've discovered two things, without actually making it work:

The "Failed to read" errors disappear when I use an m3 instance rather than a t1 instance

The "TPM Front" errors are seemingly unrelated to this issue, as they appear in many logs from images that actually boot

I still haven't figured out why it the kernel exits after finding a network device, but before getting an IP address via DHCP. Any clues on how to debug this would be greatly appreciated.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GaloisInc/HaLVM/issues/77#issuecomment-243662616, or mute the thread https://github.com/notifications/unsubscribe-auth/AACdHYLKmgHm2phO1zz21iR8Kb1nyaJ0ks5qlRGugaJpZM4JipyR .

acw commented 8 years ago

Well, I'll tell you first, debugging on AWS is a pain. :)

So, question #1: it sounds like it's exiting / crashing, not just running forever?

One step would be to make sure you can see console messages (via writeConsole). Send those out as soon as you have a console available, and make sure you can see them. After that, add a top-level exception handler that catches everything and prints it out to said console, and then extend this to every point that creates a thread. Just to make sure that there isn't an exception flowing around that we're missing.

I'll note that I'm a little slow because I'm working on a HaLVM web server myself, and running into other EC2 problems. I'll update this ticket and you as I have news on my side, but I'd love to get the base version working, too.

acw commented 8 years ago

OK, turns out there was at least one bug in the network driver code that could cause hangs when sending data on the network. This has been fixed in head, and should roll out to the various repos over the course of the day. In addition, I have an updated HaLVM web server that I'm putting together; I'll update this ticket with news as it gets finalized over the next day or two.

acw commented 8 years ago

OK, that took longer than planned. Cabal and Docker threw some wrenches in my plan. But, here's the new, improved web server. I should update the examples directory to point to it:

https://github.com/GaloisInc/halvm-web

runeksvendsen commented 8 years ago

This looks great! I will definitely get around to trying it out at a later time.

GaloisInc / HaLVM

WebServer unikernel shuts down after "Starting server on device ..." #77