Status

This project is no longer actively maintained and the repository has been archived.

stackdriver-tools release for BOSH

This release provides Cloud Foundry and BOSH integration with Google Cloud Platform's Stackdriver Logging and Monitoring.

Functionality is provided by 3 jobs in this release:

A nozzle job for forwarding Cloud Foundry Firehose data to Stackdriver
A Fluentd job for forwarding syslog and template logs to Stackdriver Logging
A Stackdriver Monitoring Agent job for sending VM health metrics to Stackdriver Monitoring

Project Status

The following is generally available:

Stackdriver Host Monitoring Agent (stackdriver-agent)
Stackdriver Host Logging Agent (google-fluentd)
Stackdriver Nozzle (stackdriver-nozzle)
- Stackdriver Logging for Cloud Foundry Log Events (LogMessage, Error, HttpStartStop)
- Stackdriver Monitoring for Cloud Foundry Metric Events (ContainerMetric, ValueMetric, CounterEvent)

The following is in beta:

Stackdriver Nozzle
- Stackdriver Logging for Cloud Foundry Metric Events (ContainerMetric, ValueMetric, CounterEvent)

The project was developed in partnership with Google and Pivotal and is actively maintained by Google.

Getting started

Enable Stackdriver APIs

Ensure the Stackdriver Logging and Stackdriver Monitoring APIs are enabled.

Quotas

Depending on the size of the cloud foundry deployment and which events the nozzle is forwarding, it can be quite easy to reach the default Stackdriver quotas:

Google quotas can be viewed and managed on the API Quotas Page. An operator can increase the default quota up to a limit; exceeding that, use the contact links to request even higher quotas.

Create and configure service accounts

All of the jobs in this release authenticate to Stackdriver Logging and Monitoring via Service Accounts. Follow the GCP documentation to create a service account via gcloud with the following roles:

roles/logging.logWriter
roles/logging.configWriter
roles/monitoring.metricWriter

You can either authenticate the job(s) by specifying the service account in the cloud_properties for the resource pool running the job(s) or by configuring credentials.application_default_credentials in the job spec.

You may also read the access control documentation for more general information about how authentication and authorization work for Stackdriver.

General usage

To use any of the jobs in this BOSH release, first upload it to your BOSH director:

bosh2 upload-release https://storage.googleapis.com/bosh-gcp/beta/stackdriver-tools/latest.tgz

The stackdriver-tools.yml sample BOSH 2.0 manifest illustrates how to use all 3 jobs in this release (nozzle, host logging, and host monitoring). You can deploy the sample with the following commands:

bosh2 upload-stemcell https://bosh.io/d/stemcells/bosh-google-kvm-ubuntu-trusty-go_agent

bosh2 update-cloud-config -n manifests/cloud-config-gcp.yml \
          -v zone=... \
          -v network=... \
          -v subnetwork=... \
          -v "tags=['stackdriver-nozzle']" \
          -v internal_cidr=... \
          -v internal_gw=... \
          -v "reserved=[10....-10....]"

bosh2 deploy manifests/stackdriver-tools.yml \
            -d stackdriver-nozzle \
            --var=firehose_endpoint=https://.. \
            --var=firehose_username=stackdriver_nozzle \
            --var=firehose_password=... \
            --var=skip_ssl=false \
            --var=gcp_project_id=... \
            --var-file=gcp_service_account_json=path/to/service_account.json \

This will create a self-contained deployment that sends Cloud Foundry firehose data, host logs, and host metrics to Stackdriver.

Deploying each job individually is described in detail below.

Deploying the nozzle

Create a new deployment manifest for the nozzle. See the example manifest for a full deployment and the jobs.stackdriver-nozzle section for the nozzle.

To reduce message loss, operators should run a minimum of two instances. With two instances, updating stemcells and other destructive BOSH operations will still leave an instance draining logs.

The loggregator system will round-robin messages across multiple instances. If the nozzle can't handle the load, consider scaling to more than two nozzle instances.

The spec describes all the properties an operator should modify.

Stackdriver Error Reporting

Stackdriver can automatically detect and report errors from stack traces in logs. However, this does not automatically work with Loggregator because it sends each line from app output as a separate log message to the nozzle. To enable this feature of Stackdriver, apps will need to manually encode stacktraces on a single line so that the stackdriver-nozzle can send them as single messages to Stackdriver.

This is accomplished by replacing newlines in stacktraces with a unique character, which is set using the firehose.newline_token template variable in the nozzle so that the nozzle can reconstruct the stacktrace on multiple lines.

For example, if firehose.newline_token is set to ∴, a Go app would need to implement something like the following:

const newlineToken = "∴"

func main() {
    ...
    defer handlePanic()
    ...
}

func handlePanic() {
        e := recover()
        if e == nil {
            return
        }

        stack := make([]byte, 1<<16)
        stackSize := runtime.Stack(stack, true)
        out := string(stack[:stackSize])

        fmt.Fprintf(os.Stderr, "panic: %v", e)
        fmt.Fprintf(os.Stderr, strings.Replace(out, "\n", newlineToken, -1))
        os.Exit(1)
}

This outputs the stacktrace separately from the panic so that the panic remains in the logs and the stacktrace is logged by itself. This allows Stackdriver to detect the stacktrace as an error.

For an example in Java, see this section of the Loggregator documentation.

Deploying host logging

The google-fluentd template uses Fluentd to send both syslog and template logs (assuming that template jobs are writing logs into /var/vcap/sys/log/*/*.log) to Stackdriver Logging.

To forward host logs from BOSH VMs to Stackdriver, co-locate the google-fluentd template with an existing job whose host logs should be forwarded.

Include the stackdriver-tools release in your existing deployment manifest:

releases:
  ...
  - name: stackdriver-tools
    version: latest
  ...

Add the google-fluentd template to your job:

jobs:
  ...
  - name: nats
    templates:
      - name: nats
        release: cf
      - name: metron_agent
        release: cf
      - name: google-fluentd
        release: stackdriver-tools
  ...

Deploying host monitoring

The stackdriver-agent template uses the Stackdriver Monitoring Agent to collect VM metrics to send to Stackdriver Monitoring.

To forward host metrics forwarding from BOSH VMs to Stackdriver, co-locate the stackdriver-agent template with an existing job whose host metrics should be forwarded.

Include the stackdriver-tools release in your existing deployment manifest:

releases:
  ...
  - name: stackdriver-tools
    version: latest
  ...

Add the stackdriver-agent template to your job:

jobs:
  ...
  - name: nats
    templates:
      - name: nats
        release: cf
      - name: metron_agent
        release: cf
      - name: stackdriver-agent
        release: stackdriver-tools
  ...

Deploying as a BOSH addon

Specify the jobs as addons in your runtime config to deploy Stackdriver Monitoring and Logging agents on all instances in your deployment. Do not specify the jobs as part of your deployment manifest if you are using the runtime config.

# runtime.yml
---
releases:
  - name: stackdriver-tools
    version: latest

addons:
- name: stackdriver-tools
  jobs:
  - name: google-fluentd
    release: stackdriver-tools
  - name: stackdriver-agent
    release: stackdriver-tools

To update the runtime config:

bosh2 update-runtime-config -d <your deployment> runtime.yml

Then redeploy your manifest:

bosh2 deploy -d <your deployment> path/to/manifest.yml

Development

Updating google-fluentd

google-fluentd is versioned by the Gemfile in src/google-fluentd. To update fluentd:

Update the version specifier in the Gemfile (if necessary)
Update Gemfile.lock: bundle update
Create a vendor cache from the Gemfile.lock: bundle package
Tar and compress the vendor folder: tar zvc vendor > google-fluentd-vendor-<VERSION>-plugin-<VERSION>.tgz
Update the vendor version in the google-fluentd package packaging and spec
Add vendored cache to the BOSH blobstore: bosh2 add-blob google-fluentd-vendor-<VERSION>-plugin-<VERSION>.tgz google-fluentd-vendor/google-fluentd-vendor-VERSION-NUMBER.tgz
Create a dev release and deploy it to verify that all of the above worked
Update the BOSH blobstore: bosh upload-blobs
Commit your changes

bosh-lite

Both the nozzle and the fluentd jobs can run on bosh-lite. To generate a working manifest, start from the bosh-lite-example-manifest. Note the application_default_credentials property, which should be filled in with the contents of a Google service account key.

Contributing

For details on how to contribute to this project - including filing bug reports and contributing code changes - please see CONTRIBUTING.md.

cloudfoundry-community / stackdriver-tools

readme