cloudfoundry-attic / bosh-init

bosh-init is a tool used to create and update the Director VM
Apache License 2.0
31 stars 33 forks source link

Error messages rather than golang panics #7

Closed drnic closed 9 years ago

drnic commented 9 years ago

I tried again today to deploy a simple redis/redis micro deployment and when I first try I get golang panics unmarshalling the manifest:

$ bosh-micro deploy assets/light-bosh-stemcell-2905-aws-xen-ubuntu-trusty-go_agent.tgz assets/bosh-aws-cpi-release-5.tgz assets/redis-9.tgz 
Deployment manifest: '/Users/drnic/Projects/bosh-deployments/experiments/redis-micro/redis-micro.yml'
Deployment state: '/Users/drnic/Projects/bosh-deployments/experiments/redis-micro/deployment.json'

Started validating
  Validating stemcell... Finished (00:00:00)
  Validating releases... Finished (00:00:00)
  Validating deployment manifest... Failed (00:00:00)
Failed validating (00:00:00)

Command 'deploy' failed:
  Parsing deployment manifest '/Users/drnic/Projects/bosh-deployments/experiments/redis-micro/redis-micro.yml':
    Unmarshalling BOSH deployment manifest:
      Resolve failed for map
/Users/drnic/Projects/go/src/github.com/cloudfoundry/bosh-micro-cli/Godeps/_workspace/src/github.com/cloudfoundry-incubator/candiedyaml/decode.go:97 (0x41758e7)
    recovery: stackTrace := debug.Stack()
/usr/local/go/src/runtime/asm_amd64.s:401 (0x403c705)
    call16: CALLFN(·call16, 16)
/usr/local/go/src/runtime/panic.go:387 (0x40146a8)
    gopanic: reflectcall(unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
/Users/drnic/Projects/go/src/github.com/cloudfoundry/bosh-micro-cli/Godeps/_workspace/src/github.com/cloudfoundry-incubator/candiedyaml/decode.go:147 (0x41766fa)
    (*Decoder).error: panic(err)
/Users/drnic/Projects/go/src/github.com/cloudfoundry/bosh-micro-cli/Godeps/_workspace/src/github.com/cloudfoundry-incubator/candiedyaml/decode.go:527 (0x417a815)
    (*Decoder).scalar: d.error(err)
/Users/drnic/Projects/go/src/github.com/cloudfoundry/bosh-micro-cli/Godeps/_workspace/src/github.com/cloudfoundry-incubator/candiedyaml/decode.go:217 (0x4177546)
    (*Decoder).parse: d.scalar(rv)
/Users/drnic/Projects/go/src/github.com/cloudfoundry/bosh-micro-cli/Godeps/_workspace/src/github.com/cloudfoundry-incubator/candiedyaml/decode.go:500 (0x417a453)
    (*Decoder).mappingStruct: d.parse(subv)
/Users/drnic/Projects/go/src/github.com/cloudfoundry/bosh-micro-cli/Godeps/_workspace/src/github.com/cloudfoundry-incubator/candiedyaml/decode.go:409 (0x4179d57)
    (*Decoder).mapping: d.mappingStruct(v)

In my 3 years of BOSH, no one ever gets a manifest right initially; and then they get them wrong a lot of the time later. I don't think panicing is a good UX :)

Any ideas/plans on how we will solve this?

drnic commented 9 years ago

The manifest I used was https://gist.github.com/drnic/f953407db9fcbb24a050

I'm investigating.

drnic commented 9 years ago

BTW, doing nothing but making bosh-micro deploy errors go away, the following is the minimum manifest I got away with:

---
name: redis-from-scratch

resource_pools:
- name: default
  network: default
  cloud_properties:
    instance_type: m3.medium

networks:
- name: default
  type: manual
  cloud_properties:
    subnet: subnet-37086b40
    range: 10.10.0.0/24
    reserved: [10.10.0.1-10.10.0.3]
    static: [10.10.0.4]

cloud_provider:
  template:
    name: cpi
    release: bosh-aws-cpi

I didn't test smaller networks and resource_pools sections; I copied them from the gist above.

drnic commented 9 years ago

Eventual after CPI compilation, it continues failing with:

Started installing CPI
  Compiling package 'ruby_aws_cpi/052a28b8976e6d9ad14d3eaec6d3dd237973d800'... Finished (00:01:18)
  Compiling package 'bosh_aws_cpi/deabbf731a4fedc9285324d85af6456cfa74c10c'... Finished (00:00:34)
  Rendering job templates... Failed (00:00:00)
Failed installing CPI (00:01:53)

Command 'deploy' failed:
  Building installation state:
    Rendering job templates for installation:
      Rendering templates for job 'cpi/ca1bbc783f6b12eb1d066ac4b54c75fb8351465e':
        Rendering template src: cpi.yml.erb, dst: config/cpi.yml:
          Rendering template src: /var/folders/3x/fch8r6z97ljct1tfzk7kk34w0000gn/T/bosh-micro-release459033935/extracted_jobs/cpi/templates/cpi.yml.erb, dst: /var/folders/3x/fch8r6z97ljct1tfzk7kk34w0000gn/T/re$dered-jobs298615990/config/cpi.yml:
            Running ruby to render templates:
              Running command: 'ruby /var/folders/3x/fch8r6z97ljct1tfzk7kk34w0000gn/T/erb-renderer013178712/erb-render.rb /var/folders/3x/fch8r6z97ljct1tfzk7kk34w0000gn/T/erb-renderer013178712/erb-context.json /var/folders/3x/fch8r6z97ljct1tfzk7kk34w0000gn/T/bosh-micro-release459033935/extracted_jobs/cpi/templates/cpi.yml.erb /var/folders/3x/fch8r6z97ljct1tfzk7kk34w0000gn/T/rendered-jobs298615990/config/cpi.yml', st$out: '', stderr: '/var/folders/3x/fch8r6z97ljct1tfzk7kk34w0000gn/T/erb-renderer013178712/erb-render.rb:180:in `rescue in render': Error filling in template '/var/folders/3x/fch8r6z97ljct1tfzk7kk34w0000gn/T/bos$-micro-release459033935/extracted_jobs/cpi/templates/cpi.yml.erb' for cpi/0 (line 5: #<TemplateEvaluationContext::UnknownProperty: Can't find property 'aws.access_key_id'>) (RuntimeError)
        from /var/folders/3x/fch8r6z97ljct1tfzk7kk34w0000gn/T/erb-renderer013178712/erb-render.rb:166:in `render'
        from /var/folders/3x/fch8r6z97ljct1tfzk7kk34w0000gn/T/erb-renderer013178712/erb-render.rb:191:in `<main>'

But now the error message isn't as pretty as it was prior to compiling the CPI.

drnic commented 9 years ago
---
name: redis-from-scratch

resource_pools:
- name: default
  network: default
  cloud_properties:
    instance_type: m3.medium

networks:
- name: default
  type: manual
  cloud_properties:
    subnet: subnet-37086b40
    range: 10.10.0.0/24
    reserved: [10.10.0.1-10.10.0.3]
    static: [10.10.0.4]

cloud_provider:
  template:
    name: cpi
    release: bosh-aws-cpi
  properties:
    aws:
      access_key_id: XXX
      secret_access_key: YYY
      default_key_name: sw-bosh-us-east-1
      default_security_groups: [bastion]
      region: us-east-1
      ec2_private_key: ~/.ssh/sw-bosh-us-east-1.pem
    registry:
      username: admin
      password: admin
      port: 6901
      host: localhost
    blobstore:
      provider: local
      path: /var/vcap/micro_bosh/data/cache
    nats:
      address: 127.0.0.1
      password: nats

Was the minimum before it would try to create a stemcell.

Then it fails with:

Command 'deploy' failed:
  Deploying Microbosh:
    There must only be one job, found 0
drnic commented 9 years ago

I add the jobs section and then it panics:

jobs:
- name: redis
  instances: 1
  templates:
  - name: redis
    release: redis
  networks:
  - name: default
    static_ips:
    - 10.10.0.4
  properties:
  properties:
    redis:
      address: "127.0.0.1"
      password: "redis"
      port: 25255
drnic commented 9 years ago
properties:
properties:

Was the issue here. But golang panicing cannot be the UX for bad YAML. Hopefully there is something we can do to aide fools like me who write back YAML.

drnic commented 9 years ago

Perhaps a simple pre-marshaling effort to check "can this YAML be parsed as YAML?" "There's a YAML error on line XYZ"

drnic commented 9 years ago

Going to keep using this thread as a log of one person's efforts to build up a manifest.

Was getting this error:

Command 'deploy' failed:
  Deploying Microbosh:
    Creating instance 'redis/0':
      Creating VM:
        Creating vm with stemcell cid 'ami-b88da7d0 light':
          External CPI command for method 'create_vm' returned an error: CmdError{"type":"Unknown","message":"Connection refused - connect(2) (http://localhost:6901)","ok_to_retry":false}

This went away when I added the seemingly redundant cloud_provider.registrar:

cloud_provider:
  template:
    name: cpi
    release: bosh-aws-cpi
  ssh_tunnel:
    host: 10.10.0.4
    port: 22
    user: vcap
    private_key: /home/ubuntu/.ssh/sw-bosh-us-east-1.pem
  registry:
    username: admin
    password: admin
    port: 6901
    host: localhost
  properties:
    aws:
      access_key_id: XXX
      secret_access_key: YYY
      default_key_name: sw-bosh-us-east-1
      default_security_groups: [redis-micro-default-vpc]
      region: us-east-1
      ec2_private_key: ~/.ssh/sw-bosh-us-east-1.pem
    registry:
      username: admin
      password: admin
      port: 6901
      host: localhost
    blobstore:
      provider: local
      path: /var/vcap/micro_bosh/data/cache
    nats:
      address: 127.0.0.1
      password: nats

Seems strange to have to enter all this - most of it seems to be defaults. Perhaps they can be optional - would make the manifest much smaller.

drnic commented 9 years ago

Needed to fix path to key in cloud_provider. ssh_tunnel. private_key since I was on a different machine.

Tried ~/.ssh/sw-bosh-us-east-1.pem (like in ec2_private_key) but get:

Command 'deploy' failed:
  Deploying Microbosh:
    Creating instance 'redis/0':
      Waiting until instance is ready:
        Starting SSH tunnel:
          Reading private key file '~/.ssh/sw-bosh-us-east-1.pem':
            open ~/.ssh/sw-bosh-us-east-1.pem: no such file or directory

This issue is already raised in https://github.com/cloudfoundry/bosh-micro-cli/issues/4 but no response from anyone

drnic commented 9 years ago

With the ssh key fixed, the agent wasn't coming up. SSHing into the VM and looking at /var/vcap/bosh/log/current I see:

2015-04-04_04:24:26.38129 [settingsService] 2015/04/04 04:24:26 DEBUG - Loading settings from fetcher
2015-04-04_04:24:26.38186 [registryProvider] 2015/04/04 04:24:26 DEBUG - Using http registry at http://admin:admin@localhost:6901
2015-04-04_04:24:26.39146 [settingsService] 2015/04/04 04:24:26 ERROR - Failed loading settings via fetcher: Getting settings from url: Get http://admin:admin@localhost:6901/instances/i-f50bc508/settings: dial tcp 127.0.0.1:6901: connection refused
2015-04-04_04:24:26.39158 [File System] 2015/04/04 04:24:26 DEBUG - Reading file /var/vcap/bosh/settings.json
2015-04-04_04:24:26.39169 [settingsService] 2015/04/04 04:24:26 ERROR - Failed reading settings from file Opening file /var/vcap/bosh/settings.json: open /var/vcap/bosh/settings.json: no such file or directory
2015-04-04_04:24:26.39180 [main] 2015/04/04 04:24:26 ERROR - App setup Running bootstrap: Fetching settings: Invoking settings fetcher: Getting settings from url: Get http://admin:admin@localhost:6901/instances/i-f50bc508/settings: dial tcp 127.0.0.1:6901: connection refused

At this time, on the host machine there is no other related process running (no registry process) other than bosh-micro deploy.

drnic commented 9 years ago

I'll keep trying tonight if anyone / @cppforlife is around and can suggest something.

I am running this on my OS X machine into us-east-1. I've added an elastic IP in addition to the static IP above.

cppforlife commented 9 years ago

Did you resolve this problem?

karlkfi commented 9 years ago

FYI, I raised the issue of the stack trace and its inscrutibility with @fraenkel in this PR: https://github.com/fraenkel/candiedyaml/issues/5

He has since added commits to https://github.com/cloudfoundry-incubator/candiedyaml to improve failure messages when unmarshalling. The dependency will need to be bumped in bosh-init to take advantage of this.

cppforlife commented 9 years ago

We have switched to using go-yaml library which provides better error messages for invalid YAML.