Smithx10 / nomad-driver-triton

HashiCorp Nomad Triton driver plugin
15 stars 6 forks source link

unable to list packages #24

Open myfreeware opened 3 years ago

myfreeware commented 3 years ago

A package block is specified in the job specification as follows: package { name = "sample-128M" } and run the job, the job is pending status.

The log of nomad has error info "unable to list packages" as follows:

2020-12-15T05:29:44.445Z [INFO]  client.driver_mgr.triton: Inside tth GetPackage: driver=triton @module=triton timestamp=2020-12-15T05:29:44.445Z
2020-12-15T05:29:46.646Z [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=5bc46717-7027-1f73-0a97-eeffa8e55bbb task=nginx error="rpc error: code = Unknown desc = unable to list packages: unable to execute HTTP request: Get "https://10.88.88.5/demouser/packages?name=sample-128M": x509: cannot validate certificate for 10.88.88.5 because it doesn't contain any IP SANs"
2020-12-15T05:29:46.647Z [INFO]  client.alloc_runner.task_runner: not restarting task: alloc_id=5bc46717-7027-1f73-0a97-eeffa8e55bbb task=nginx reason="Error was unrecoverable"
2020-12-15T05:29:46.647Z [INFO]  client.gc: marking allocation for GC: alloc_id=5bc46717-7027-1f73-0a97-eeffa8e55bbb
teutat3s commented 3 years ago

The problem is related to your setup, maybe you could use a local DNS entry like cloudapi.test for that IP in /etc/hosts for testing and create a self-signed cert for that hostname?

What cloudapi endpoint URL do you set when using triton profile create?

teutat3s commented 3 years ago

By the way, we have moved development to https://github.com/teutat3s/nomad-triton-driver-plugin if you'd like to follow future nomad triton driver releases

myfreeware commented 3 years ago

I am very glad to know about the continuous development of the nomad triton driver ! I'm using coal to study triton. First, I use IP address as cloudapi endpoint URL: bash-5.0# triton profile list NAME CURR ACCOUNT USER URL env * demouser - https://10.88.88.5 bash-5.0# bash-5.0# triton profile get env name: env account: demouser curr: true insecure: true keyId: SHA256:4ynp+jKidHP/Tg1k1a3YX8+eEIlpvIYS59FKrlO4A/g url: https://10.88.88.5 In the above case, nomad has error: 2020-12-15T05:29:46.646Z [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=5bc46717-7027-1f73-0a97-eeffa8e55bbb task=nginx error="rpc error: code = Unknown desc = unable to list packages: unable to execute HTTP request: Get "https://10.88.88.5/demouser/packages?name=sample-128M": x509: cannot validate certificate for 10.88.88.5 because it doesn't contain any IP SANs"

After this, I change env variable, I use host name as cloudapi endpoint URL: bash-5.0# triton profile list NAME CURR ACCOUNT USER URL env * demouser - https://cloudapi.coal-1.cns.cloudeasy.cloud bash-5.0# bash-5.0# triton profile get env name: env account: demouser curr: true insecure: true keyId: SHA256:4ynp+jKidHP/Tg1k1a3YX8+eEIlpvIYS59FKrlO4A/g url: https://cloudapi.coal-1.cns.cloudeasy.cloud

And recreate the docker private key(by triton profile docker-setup command), add the host name in /etc/hosts file: 10.88.88.5 cloudapi.coal-1.cns.cloudeasy.cloud cloudapi 10.88.88.6 docker.coal-1.cns.cloudeasy.cloud docker

now nomad has error: 2020-12-15T06:05:55.909Z [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=131a96e1-77a8-7905-f19a-247d1cf01687 task=nginx error="rpc error: code = Unknown desc = unable to list packages: unable to execute HTTP request: Get "https://cloudapi.coal-1.cns.cloudeasy.cloud/demouser/packages?name=sample-128M": x509: certificate is not valid for any names, but wanted to match cloudapi.coal-1.cns.cloudeasy.cloud" 2020-12-15T06:05:55.909Z [INFO] client.alloc_runner.task_runner: not restarting task: alloc_id=131a96e1-77a8-7905-f19a-247d1cf01687 task=nginx reason="Error was unrecoverable"

myfreeware commented 3 years ago

In both cases, the "triton" and "triton-docker" commands can run normally.

teutat3s commented 3 years ago

Maybe setting the following nomad environment variable could help:

export NOMAD_SKIP_VERIFY=true

See: https://www.nomadproject.io/docs/commands/job/run#tls-skip-verify

Another useful variable seems to be

export TRITON_SKIP_TLS_VERIFY=true

From: https://github.com/joyent/triton-go

myfreeware commented 3 years ago

By default, Triton uses self-signed certificates for CloudAPI. I think that the self-signed certificate does not list primary hostname (domain name) as the Common Name in the subject field of the certificates. "A TLS server may be configured with a self-signed certificate. When that is the case, clients will generally be unable to verify the certificate, and will terminate the connection unless certificate checking is disabled." (from wiki https://en.wikipedia.org/wiki/Public_key_certificate#TLS/SSL_server_certificate) I set the two variables: export NOMAD_SKIP_VERIFY=true export TRITON_SKIP_TLS_VERIFY=true or export NOMAD_SKIP_VERIFY=1 export TRITON_SKIP_TLS_VERIFY=1 and set TRITON_TLS_INSECURE=1 for triton CLI. then run a job. these variables setup do not suppress driver to validate the CloudAPI SSL certificate, there still exists the same error info in nomad log.

myfreeware commented 3 years ago

I stop nomad agent and triton driver, set TRITON_SKIP_TLS_VERIFY=1 and NOMAD_SKIP_VERIFY=1, then start nomad and triton drive. they run normally. When I run a job same as previous, this time, there is new error info in nomad log "client.driver_mgr.triton: panic"and "plugin is shut down":

2020-12-15T05:46:41.406Z [DEBUG] client.driver_mgr.triton: panic: runtime error: invalid memory address or nil pointer dereference: driver=triton
2020-12-15T05:46:41.406Z [DEBUG] client.driver_mgr.triton: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xd7d200]: driver=triton
2020-12-15T05:46:41.406Z [DEBUG] client.driver_mgr.triton: : driver=triton
2020-12-15T05:46:41.406Z [DEBUG] client.driver_mgr.triton: goroutine 36 [running]:: driver=triton
2020-12-15T05:46:41.406Z [DEBUG] client.driver_mgr.triton: github.com/Smithx10/nomad-driver-triton/plugin.(*TritonTaskHandler).CreateInstance(0xc00007a980, 0x113e920, 0xc00007bd00, 0xc000204c00, 0xc000263e30, 0xa, 0x0, 0x0, 0x0, 0x0, ...): driver=triton
2020-12-15T05:46:41.406Z [DEBUG] client.driver_mgr.triton:  /root/tmp/nomad-driver-triton-master/plugin/triton.go:263 +0x1840: driver=triton
2020-12-15T05:46:41.406Z [DEBUG] client.driver_mgr.triton: github.com/Smithx10/nomad-driver-triton/plugin.(*TritonTaskHandler).NewTritonTask(0xc00007a980, 0xc000204c00, 0xc000263e30, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...): driver=triton
2020-12-15T05:46:41.406Z [DEBUG] client.driver_mgr.triton:  /root/tmp/nomad-driver-triton-master/plugin/triton.go:65 +0x158: driver=triton
2020-12-15T05:46:41.406Z [DEBUG] client.driver_mgr.triton: github.com/Smithx10/nomad-driver-triton/plugin.(*Driver).StartTask(0xc000266780, 0xc000204c00, 0xc000392580, 0x535, 0x535, 0xf59620): driver=triton
2020-12-15T05:46:41.406Z [DEBUG] client.driver_mgr.triton:  /root/tmp/nomad-driver-triton-master/plugin/plugin.go:356 +0x498: driver=triton
2020-12-15T05:46:41.407Z [DEBUG] client.driver_mgr.triton: github.com/hashicorp/nomad/plugins/drivers.(*driverPluginServer).StartTask(0xc00000f640, 0x113e9e0, 0xc000267f50, 0xc000267f80, 0xc00000f640, 0xc000267f50, 0xc000062ba0): driver=triton
2020-12-15T05:46:41.408Z [DEBUG] client.driver_mgr.triton:  /root/go/pkg/mod/github.com/hashicorp/nomad@v0.12.3/plugins/drivers/server.go:105 +0x62: driver=triton
2020-12-15T05:46:41.408Z [DEBUG] client.driver_mgr.triton: github.com/hashicorp/nomad/plugins/drivers/proto._Driver_StartTask_Handler(0xfa1720, 0xc00000f640, 0x113e9e0, 0xc000267f50, 0xc00038a780, 0x0, 0x113e9e0, 0xc000267f50, 0xc000392580, 0x535): driver=triton
2020-12-15T05:46:41.408Z [DEBUG] client.driver_mgr.triton:  /root/go/pkg/mod/github.com/hashicorp/nomad@v0.12.3/plugins/drivers/proto/driver.pb.go:4337 +0x214: driver=triton
2020-12-15T05:46:41.408Z [DEBUG] client.driver_mgr.triton: google.golang.org/grpc.(*Server).processUnaryRPC(0xc00020c9c0, 0x114e280, 0xc00008d380, 0xc0001a2e00, 0xc000266c90, 0x1679b88, 0x0, 0x0, 0x0): driver=triton
2020-12-15T05:46:41.408Z [DEBUG] client.driver_mgr.triton:  /root/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1082 +0x522: driver=triton
2020-12-15T05:46:41.408Z [DEBUG] client.driver_mgr.triton: google.golang.org/grpc.(*Server).handleStream(0xc00020c9c0, 0x114e280, 0xc00008d380, 0xc0001a2e00, 0x0): driver=triton
2020-12-15T05:46:41.408Z [DEBUG] client.driver_mgr.triton:  /root/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1405 +0xcc5: driver=triton
2020-12-15T05:46:41.408Z [DEBUG] client.driver_mgr.triton: google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0001fb770, 0xc00020c9c0, 0x114e280, 0xc00008d380, 0xc0001a2e00): driver=triton
2020-12-15T05:46:41.408Z [DEBUG] client.driver_mgr.triton:  /root/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:746 +0xa5: driver=triton
2020-12-15T05:46:41.408Z [DEBUG] client.driver_mgr.triton: created by google.golang.org/grpc.(*Server).serveStreams.func1: driver=triton
2020-12-15T05:46:41.408Z [DEBUG] client.driver_mgr.triton:  /root/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:744 +0xa5: driver=triton
2020-12-15T05:46:41.415Z [DEBUG] client.driver_mgr: plugin process exited: driver=triton path=/root/nomad/plugins/triton pid=76882 error="exit status 2"
2020-12-15T05:46:41.416Z [INFO]  client.alloc_runner.task_runner: failed to start task because plugin shutdown unexpectedly; attempting to recover: alloc_id=6bc26062-b3a4-ad84-606a-c0fa7ba6fcb8 task=nginx
2020-12-15T05:46:41.416Z [WARN]  client.driver_mgr: failed to reattach to plugin, starting new instance: driver=triton err="singleton plugin exited"
2020-12-15T05:46:41.416Z [DEBUG] client.driver_mgr: starting plugin: driver=triton path=/root/nomad/plugins/triton args=[/root/nomad/plugins/triton]
2020-12-15T05:46:41.422Z [DEBUG] client.driver_mgr: plugin started: driver=triton path=/root/nomad/plugins/triton pid=77185
2020-12-15T05:46:41.423Z [DEBUG] client.driver_mgr: waiting for RPC address: driver=triton path=/root/nomad/plugins/triton
2020-12-15T05:46:41.416Z [WARN]  client.driver_mgr: received fingerprint error from driver: driver=triton error="plugin is shut down"

Also please refer to the nomad.log

teutat3s commented 3 years ago

That looks like progess, nice! There's a fix for that error in the new driver repo, that didn't make it here yet - I've created a PR https://github.com/Smithx10/nomad-driver-triton/pull/25 to integrate that change here, too - with that change everything should work fine.

myfreeware commented 3 years ago

I repeat the process as follows: stop nomad agent and triton driver, build a new triton driver from https://github.com/teutat3s/nomad-triton-driver-plugin,
set TRITON_SKIP_TLS_VERIFY=1 and NOMAD_SKIP_VERIFY=1, start nomad and triton driver, they run normally. run a job, unfortunately there still exists the error info "x509: cannot validate certificate" in nomad log just like the first:

2020-12-15T06:25:39.721Z [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=fb45b539-6c93-2306-7e54-433603dd9e0a task=nginx error="rpc error: code = Unknown desc = unable to list packages: unable to execute HTTP request: Get "https://10.88.88.5/demouser/packages?name=sample-128M": x509: cannot validate certificate for 10.88.88.5 because it doesn't contain any IP SANs" 2020-12-15T06:25:39.721Z [INFO] client.alloc_runner.task_runner: not restarting task: alloc_id=fb45b539-6c93-2306-7e54-433603dd9e0a task=nginx reason="Error was unrecoverable"