cneira / firecracker-task-driver

nomad task driver that uses firecracker to start micro-vms
Apache License 2.0
145 stars 18 forks source link

Request: Propagate Firecracker Task Driver errors to Nomad UI #19

Closed wimax-grapl closed 2 years ago

wimax-grapl commented 2 years ago

So I have a task start failing with the following, not-very-useful info:

rpc error: code = Unknown desc = task with ID "8ee3098b-7420-cb04-2892-fedaa3c730ba/tenant-plugin/339ec6bd" failed

image

However, going to the Nomad Agent logs I get the following, much more intelligible errors: failure when invoking CNI: failed to load CNI configuration from dir \"/etc/cni/conf.d\" for network \"default\": no net configurations found in /etc/cni/conf.d"

    2022-04-04T13:23:32.274-0400 [INFO]  client.driver_mgr.firecracker-task-driver: starting firecracker task: driver=firecracker-task-driver driver_cfg="{KernelImage: BootOptions: BootDis
k: Disks:[] Network:default Nic:{Ip: Gateway: Interface: Nameservers:[]} Vcpus:1 Cputype: Mem:128 Firecracker:/usr/bin/firecracker Log: DisableHt:false}" @module=firecracker-task-driver ti
mestamp=2022-04-04T13:23:32.274-0400
    2022-04-04T13:23:32.274-0400 [INFO]  client.driver_mgr.firecracker-task-driver: Starting firecracker: driver=firecracker-task-driver driver_initialize_container="&{/usr/bin/firecracker
 /tmp/NomadClient1700322499/3aee425c-e789-5c1c-e029-d552efbf942c/tenant-plugin/vmlinux  console=ttyS0 reboot=k panic=1 pci=off nomodules /tmp/NomadClient1700322499/3aee425c-e789-5c1c-e029-
d552efbf942c/tenant-plugin/rootfs.ext4  [] default {   []} []    false 1  300    false false [] <nil> 0xc384c0}+" @module=firecracker-task-driver timestamp=2022-04-04T13:23:32.274-0400
    2022-04-04T13:23:32.275-0400 [INFO]  client.driver_mgr.firecracker-task-driver: Error starting firecracker vm: driver=firecracker-task-driver @module=firecracker-task-driver driver_cfg
="Failed to start machine: failure when invoking CNI: failed to load CNI configuration from dir \"/etc/cni/conf.d\" for network \"default\": no net configurations found in /etc/cni/conf.d"
 timestamp=2022-04-04T13:23:32.275-0400
    2022-04-04T13:23:32.275-0400 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=3aee425c-e789-5c1c-e029-d552efbf942c task=tenant-plugin error="rpc error: code = U
nknown desc = task with ID \"3aee425c-e789-5c1c-e029-d552efbf942c/tenant-plugin/0e1713e6\" failed"
    2022-04-04T13:23:32.275-0400 [INFO]  client.alloc_runner.task_runner: not restarting task: alloc_id=3aee425c-e789-5c1c-e029-d552efbf942c task=tenant-plugin reason="Error was unrecovera
ble"

I was wondering if it'd be possible to propagate that error up to the UI? Thanks!

wimax-grapl commented 2 years ago

(you'll note that the alloc_id is different, I accidentally captured a retry, but the same shows up for 8ee3098b.)

ValentaTomas commented 2 years ago

I would also appreciate this. I'm doing some custom changes to the task driver and even just letting the errors propagate as they are was really helpful.

Do you think just propagating the error here: https://github.com/cneira/firecracker-task-driver/blob/master/driver/driver.go#L297 https://github.com/cneira/firecracker-task-driver/blob/master/driver/driver.go#L258 would be alright?

wimax-grapl commented 2 years ago

I'm not the author of this plugin, but I think based on how other official supported Nomad drivers work, it'd be totally reasonable.

The vast majority of the StartTask return statements include the err in, say, the Docker driver: https://github.com/hashicorp/nomad/blob/52faa167dd0e18685440de5b6613f397b5fa0aa8/drivers/docker/driver.go#L300

ValentaTomas commented 2 years ago

I made these changed in https://github.com/cneira/firecracker-task-driver/pull/21

ValentaTomas commented 2 years ago

@cneira I think we can close this issue.