Add AWS CloudTrail shipper

sopel commented 11 years ago

AWS CloudTrail is a web service that records AWS API calls for your account and delivers log files to you:

The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the AWS service.

Accordingly, a CloudTrailshipper would be a great example/test case for #165, and other than #226 it might actually be useful for (advanced) cluster monitoring purposes. Still there are a few arguments to the contrary concerning its immediate business value:

CloudTrail delivers an event within 15 minutes of the API call, see AWS CloudTrail FAQs
Loggly Gen 2 supports Sending AWS CloudTrail Logs to Loggly out of the box, see the introductory blog post Loggly and AWS CloudTrail: A Simple Way to Operate Smarter for details

On the flipside, the required bot would have a fairly straight forward design:

listen to CloudTrail's SNS notifications for new logs
fetch the resp. log file(s) from S3
push the possibly post processed (JSON) log files to LogSearch
- CloudTrails logs in JSON format, see CloudTrail Event Reference for details

mrdavidlaing commented 11 years ago

@sopel - I think this would be a great (and useful) way to get your feet wet with bot development.

However, I wonder if a more generic "S3 bucket watcher" bot might not be a better goal - I'd love to get our AWS costing data into logsearch, and this also gets written to an S3 bucket (daily) in CSV format.

Lets get this onto your backlog, and have a more detailed design discussion closer to the time.

sopel commented 10 years ago

Extending this to the billing reports is a good idea - I think the processing part of the resulting pipeline boils down to the same thing basically, but the ingress mechanism differs:

AWS CloudTrail provides a push notification on log availability, thus triggering the pipeline is straight forward
AWS Billing lacks a push notification and requires polling the S3 bucket for updated logs, thus triggering the pipeline is slightly more involved

The former is obviously the more desirable approach architecture wise (polling is way to prevalent still these days, even within AWS), but I agree that a generic S3 bucket watcher bot has merit too for a variety of use cases (aside from being required for the billing access anyway).

:information_source: While at this, I'll mention that Netflix has recently released a more elaborate take on AWS billing access with its now OSS project Ice, which provides a birds-eye view of our large and complex cloud landscape from a usage and cost perspective (see Announcing Ice: Cloud Spend and Usage Analytics for details):

Cloud resources are dynamically provisioned by dozens of service teams within the organization and any static snapshot of resource allocation has limited value. The ability to trend usage patterns on a global scale, yet decompose them down to a region, availability zone, or service team provides incredible flexibility. Ice allows us to quantify our AWS footprint and to make educated decisions regarding reservation purchases and reallocation of resources.

Being a Grails application, we should be able to run it on Cloud Foundry comparatively easily, however, this would be both not addressing the topics at hand and also more elaborate than required here right now.

mrdavidlaing commented 10 years ago

Ice looks cool; but I think we should remain focussed on getting familiar with the boundaries of our bot & elastic search architecture. If you implement a S3 bucket watcher bot, and we discover that it doesn't work for AWS billing data, then I think we should re-investigate Ice.

In this "failure" scenario, we still win in that:

You are familiar with writing a bot and hosting it on CF
I know that logsearch doesn't work for "billing" shaped data.

sopel commented 10 years ago

:information_source: Loggly has finally enabled the CloudTrail ingestion feature accidentally announced before GA - the resulting insight into AWS usage is fairly impressive (though that's primarily based on the audit information to be available now of course), a bit freaky actually - I'll probably summarize/demo this in a post.

Accordingly, the CloudTrail logs are perfect LogSearch demo/test material for anyone using AWS once a resp. shipper would be available, not to speak of the operational insights possibly like so.

sopel commented 10 years ago

@dpb587, @mrdavidlaing - given the Loggly ingestion of all past records starting from 201311 will approximately take one week the least given the observed pace, I've spiked an ingestion into our meta cluster as follows:

I've bulk imported all September records via Atlassian Labs' cloudtrailimporter
I've created an initial CloudTrail Spike dashboard

The results are fairly promising, with two major topics/questions remaining:

1) For analysis purposes, it is important to drill down deeply into the logs via filters, with the most obvious candidates being mustNot regEx queries like (Describe|Get|List).+ to exclude the noisy read only stuff from the modifying/destructive operations.

:question: The automatically generated fields do not seem to support wildcards or regular expressions, which probably need to be enabled specifically - that's out of my knowledge/comfort zone, could you take a look at that or advise me how to achieve this in Kibana?

2) This issue is about running the ingestion on a permanent base for debugging purposes like the potential security incident at hand - the cloudtrailimporter supports connecting to the typical SNS/SQS setup to consume CloudTrail logs as they are delivered in principle (see [Configuring CloudTrail to Send Notifications](Configuring CloudTrail to Send Notifications)).

:question: I'm not sure how to best provision/run the service in a reliable way, my take would be to install it in a docker container with supervisord and keep it running somewhere, but a CF app might also be an option - suggestions, volunteers (I'm a bit short on time before my vacation)?

dpb587 commented 10 years ago

Their import tool disables fancy searches like wildcards or partial segments (index = not_analyzed: https://bitbucket.org/atlassianlabs/cloudtrailimporter/src/5e6f356684761e2791579a0945404a9afcfe130d/cloudtrailImporter.py?at=master#cl-49

Prefix queries still work, and are what makes most sense here anyway (in technical/efficiency terms). Unfortunately, I don't think Kibana supports them yet.

mrdavidlaing commented 10 years ago

@sopel - just document how to do a one off import in this issue; @fandrei is currently working on importing from StatusPage.io, and can pick this up when he finishes with that.

sopel commented 10 years ago

The tool supports command line help as usual:

$ python ./runImport.py -h
usage: runImport.py [-h] [--dry-run] [--import-file SYNCFILENAME]
                    [--import-folder SYNCFOLDER] [--import-s3-file S3FILE]
                    [--import-s3-folder S3FOLDER] [--s3-bucket S3BUCKET]
                    [--es-server ESSERVER [ESSERVER ...]]
                    [--import-sqs SQSQUEUENAME] [--sqs-region SQSREGION]
                    [--sqs-number-of-messages NUMBEROFMESSAGES]

optional arguments:
  -h, --help            show this help message and exit
  --dry-run             Pretend to perform actions but don't do them
  --import-file SYNCFILENAME
                        Import json.gz file
  --import-folder SYNCFOLDER
                        Import all json.gz files from folder (recursive)
  --import-s3-file S3FILE
                        Perform import from s3 file
  --import-s3-folder S3FOLDER
                        Perform import from s3 file
  --s3-bucket S3BUCKET  Bucket containing the file/folder to import from
  --es-server ESSERVER [ESSERVER ...]
                        List of es servers inc port (eg. localhost:9200)
  --import-sqs SQSQUEUENAME
                        Initiate SQS import from queue name
  --sqs-region SQSREGION
                        Region queue is located (Default: us-east-1)
  --sqs-number-of-messages NUMBEROFMESSAGES
                        Number of messages to consume before exiting.
                        (Default: all)

From there one can nicely do prefix based imports like e.g.: $ python ./runImport.py --s3-bucket logs-labs-cityindex-com-eu-west-1 --import-s3-folder AWS/CloudTrail/AWSLogs/860900021006/CloudTrail/eu-west-1/2014/09

Accordingly, one could bulk ingest all former logs of this account via: $ python ./runImport.py --s3-bucket logs-labs-cityindex-com-eu-west-1 --import-s3-folder AWS/CloudTrail/AWSLogs/860900021006/CloudTrail

That's pretty much it regarding the bulk import, except for the following kludge:

:exclamation: I had to change the hard coded default for --es-server in runImport.py?at=master#cl-48, because the parameter just didn't pick up the port as specified.

sopel commented 9 years ago

Closed as Won't Fix due to project being retired to the CityIndex Attic.

cityindex-attic / logsearch

Add AWS CloudTrail shipper #227