fermyon / installer

Fermyon Installer
Apache License 2.0
160 stars 39 forks source link

Spin jobs not working #56

Closed itowlson closed 2 years ago

itowlson commented 2 years ago

I can now bring up Nomad and Hippo, but when I do a spin deploy, the application (the Spin job goes into Unhealthy.

The status information for a typical Spin job is:

ivan@hecate:~$ nomad status
ID                                    Type     Priority  Status   Submit Date
a0e306fa-46e1-40ac-b42f-0033e284e102  service  50        dead     2022-06-16T08:55:59+12:00
bindle                                service  50        running  2022-06-16T07:53:53+12:00
hippo                                 service  50        running  2022-06-16T07:54:23+12:00
traefik                               service  50        running  2022-06-16T07:53:27+12:00

ivan@hecate:~$ nomad status a0e306fa-46e1-40ac-b42f-0033e284e102
ID            = a0e306fa-46e1-40ac-b42f-0033e284e102
Name          = a0e306fa-46e1-40ac-b42f-0033e284e102
Submit Date   = 2022-06-16T08:55:59+12:00
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = pending
Periodic      = false
Parameterized = false

Task Group                            Queued  Starting  Running  Failed  Complete  Lost  Unknown
a0e306fa-46e1-40ac-b42f-0033e284e102  0       0         0        2       0         0     0

Future Rescheduling Attempts
Task Group                            Eval ID   Eval Time
a0e306fa-46e1-40ac-b42f-0033e284e102  ddfbadc1  49s from now

Latest Deployment
ID          = fedcd3c5
Status      = running
Description = Deployment is running

Task Group                            Desired  Placed  Healthy  Unhealthy  Progress Deadline
a0e306fa-46e1-40ac-b42f-0033e284e102  1        2       0        2          2022-06-16T09:05:59+12:00

ID        Node ID   Task Group                            Version  Desired  Status  Created    Modified
bf3b0d08  f2e113e0  a0e306fa-46e1-40ac-b42f-0033e284e102  0        run      failed  41s ago    7s ago
6c48681e  f2e113e0  a0e306fa-46e1-40ac-b42f-0033e284e102  0        stop     failed  1m58s ago  41s ago
ivan@hecate:~$ nomad status bf3b0d08
ID                     = bf3b0d08-28e0-bfd8-441d-823f3b613e74
Eval ID                = 37a1748d
Name                   = a0e306fa-46e1-40ac-b42f-0033e284e102.a0e306fa-46e1-40ac-b42f-0033e284e102[0]
Node ID                = f2e113e0
Node Name              = hecate
Job ID                 = a0e306fa-46e1-40ac-b42f-0033e284e102
Job Version            = 0
Client Status          = failed
Client Description     = Failed tasks
Desired Status         = run
Desired Description    = <none>
Created                = 59s ago
Modified               = 25s ago
Deployment ID          = fedcd3c5
Deployment Health      = unhealthy
Reschedule Eligibility = 31s from now

Allocation Addresses
Label  Dynamic  Address
*http  yes

Task "spin" is "dead"
Task Resources
CPU      Memory   Disk     Addresses
100 MHz  300 MiB  300 MiB

Task Events:
Started At     = N/A
Finished At    = 2022-06-15T20:57:45Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type             Description
2022-06-16T08:57:47+12:00  Killing          Sent interrupt. Waiting 5s before force killing
2022-06-16T08:57:45+12:00  Alloc Unhealthy  Unhealthy because of failed task
2022-06-16T08:57:45+12:00  Not Restarting   Error was unrecoverable
2022-06-16T08:57:45+12:00  Driver Failure   failed to launch command with executor: rpc error: code = Unknown desc = file spin not found under path /home/ivan/github/fermyon-installer/local/data/nomad/alloc/bf3b0d08-28e0-bfd8-441d-823f3b613e74/spin
2022-06-16T08:57:15+12:00  Task Setup       Building Task Directory
2022-06-16T08:57:15+12:00  Received         Task received by client
vdice commented 2 years ago

Not sure if this will be helpful at all, but may at least be informative if the behavior is different: It looks like Hippo supports specifying a particular spin binary path.

Could we try adding something like Spin__BinaryPath = "<path to spin on host>" to the Hippo job env, re-run start.sh and see if behavior changes?

itowlson commented 2 years ago

@vdice Hmm, interesting! That fails with:

2022-06-16T10:24:43+12:00  Driver Failure   failed to launch command with executor:
rpc error: code = Unknown desc = file /home/ivan/github/spin/target/debug/spin
not found under path /home/ivan/github/fermyon-installer/local/data/nomad/alloc/8a8d6670-ba70-c433-f00f-92d1d84fc4d2/spin
vdice commented 2 years ago

Thanks @itowlson.

It appears that configuration is meant to represent a relative path in the allocation (cc @bacongobbler to check my understanding) and so perhaps not helpful here.

My only other idea is to see if we can try overriding the Nomad:Driver value. Not sure if it is resolving to exec or raw_exec for you (OperatingSystem.IsLinux()). Does setting Nomad__Driver = "raw_exec" help?

itowlson commented 2 years ago

I am not sure how to test that given that Hippo is downloaded rather than taken from a local copy - is there something I can set in the installer to force it?

itowlson commented 2 years ago
> System.OperatingSystem.IsLinux();;
val it: bool = true
itowlson commented 2 years ago

I'm not sure if exec or raw_exec makes a big difference. I tried these two jobs:

job "spin-raw-exec" {
  datacenters = ["dc1"]
  type        = "batch"

  group "spin-raw-exec" {
    task "spin-raw-exec" {
      driver = "raw_exec"
      config {
        command = "spin"
        args    = []

job "spin-exec" {
  datacenters = ["dc1"]
  type        = "batch"

  group "spin-exec" {
    task "spin-exec" {
      driver = "exec"
      config {
        command = "spin"
        args    = []

and both failed, although with different statuses: exec gave me the "file spin not found" message, raw_exec gave me "Terminated: exit code 2".

itowlson commented 2 years ago

Oh! spin without arguments looks like it might return exit code 2. Maybe raw_exec worked and I just looked for logs in the wrong place!

itowlson commented 2 years ago

@vdice YES! raw_exec works for my demo case - I just got confused by the output. But it looks like Hippo is sending me exec. Is there a way to override the Hippo setting so I can test this with a real Spin app?

vdice commented 2 years ago

🎉 Excellent! I wonder if raw_exec is a prereq for WSL -- and if so, if we can conditionailize things so that the installer just works for this case (or, actually, maybe the conditional in Hippo is a better fit 🤔).

Anyways, to the task at hand. Yes, it should be an env setting on the hippo job similar to the spin binary path we tried above.

Try adding Nomad__Driver = "raw_exec" to the Hippo job env.

itowlson commented 2 years ago