Closed sopel closed 9 years ago
@sopel - I think this would be a great (and useful) way to get your feet wet with bot development.
However, I wonder if a more generic "S3 bucket watcher" bot might not be a better goal - I'd love to get our AWS costing data into logsearch, and this also gets written to an S3 bucket (daily) in CSV format.
Lets get this onto your backlog, and have a more detailed design discussion closer to the time.
Extending this to the billing reports is a good idea - I think the processing part of the resulting pipeline boils down to the same thing basically, but the ingress mechanism differs:
The former is obviously the more desirable approach architecture wise (polling is way to prevalent still these days, even within AWS), but I agree that a generic S3 bucket watcher bot has merit too for a variety of use cases (aside from being required for the billing access anyway).
Cloud resources are dynamically provisioned by dozens of service teams within the organization and any static snapshot of resource allocation has limited value. The ability to trend usage patterns on a global scale, yet decompose them down to a region, availability zone, or service team provides incredible flexibility. Ice allows us to quantify our AWS footprint and to make educated decisions regarding reservation purchases and reallocation of resources.
- Being a Grails application, we should be able to run it on Cloud Foundry comparatively easily, however, this would be both not addressing the topics at hand and also more elaborate than required here right now.
Ice looks cool; but I think we should remain focussed on getting familiar with the boundaries of our bot & elastic search architecture. If you implement a S3 bucket watcher bot, and we discover that it doesn't work for AWS billing data, then I think we should re-investigate Ice.
In this "failure" scenario, we still win in that:
:information_source: Loggly has finally enabled the CloudTrail ingestion feature accidentally announced before GA - the resulting insight into AWS usage is fairly impressive (though that's primarily based on the audit information to be available now of course), a bit freaky actually - I'll probably summarize/demo this in a post.
Accordingly, the CloudTrail logs are perfect LogSearch demo/test material for anyone using AWS once a resp. shipper would be available, not to speak of the operational insights possibly like so.
@dpb587, @mrdavidlaing - given the Loggly ingestion of all past records starting from 201311 will approximately take one week the least given the observed pace, I've spiked an ingestion into our meta cluster as follows:
The results are fairly promising, with two major topics/questions remaining:
1) For analysis purposes, it is important to drill down deeply into the logs via filters, with the most obvious candidates being mustNot
regEx queries like (Describe|Get|List).+
to exclude the noisy read only stuff from the modifying/destructive operations.
2) This issue is about running the ingestion on a permanent base for debugging purposes like the potential security incident at hand - the cloudtrailimporter supports connecting to the typical SNS/SQS setup to consume CloudTrail logs as they are delivered in principle (see [Configuring CloudTrail to Send Notifications](Configuring CloudTrail to Send Notifications)).
Their import tool disables fancy searches like wildcards or partial segments (index = not_analyzed
: https://bitbucket.org/atlassianlabs/cloudtrailimporter/src/5e6f356684761e2791579a0945404a9afcfe130d/cloudtrailImporter.py?at=master#cl-49
Prefix queries still work, and are what makes most sense here anyway (in technical/efficiency terms). Unfortunately, I don't think Kibana supports them yet.
@sopel - just document how to do a one off import in this issue; @fandrei is currently working on importing from StatusPage.io, and can pick this up when he finishes with that.
The tool supports command line help as usual:
$ python ./runImport.py -h
usage: runImport.py [-h] [--dry-run] [--import-file SYNCFILENAME]
[--import-folder SYNCFOLDER] [--import-s3-file S3FILE]
[--import-s3-folder S3FOLDER] [--s3-bucket S3BUCKET]
[--es-server ESSERVER [ESSERVER ...]]
[--import-sqs SQSQUEUENAME] [--sqs-region SQSREGION]
[--sqs-number-of-messages NUMBEROFMESSAGES]
optional arguments:
-h, --help show this help message and exit
--dry-run Pretend to perform actions but don't do them
--import-file SYNCFILENAME
Import json.gz file
--import-folder SYNCFOLDER
Import all json.gz files from folder (recursive)
--import-s3-file S3FILE
Perform import from s3 file
--import-s3-folder S3FOLDER
Perform import from s3 file
--s3-bucket S3BUCKET Bucket containing the file/folder to import from
--es-server ESSERVER [ESSERVER ...]
List of es servers inc port (eg. localhost:9200)
--import-sqs SQSQUEUENAME
Initiate SQS import from queue name
--sqs-region SQSREGION
Region queue is located (Default: us-east-1)
--sqs-number-of-messages NUMBEROFMESSAGES
Number of messages to consume before exiting.
(Default: all)
From there one can nicely do prefix based imports like e.g.:
$ python ./runImport.py --s3-bucket logs-labs-cityindex-com-eu-west-1 --import-s3-folder AWS/CloudTrail/AWSLogs/860900021006/CloudTrail/eu-west-1/2014/09
Accordingly, one could bulk ingest all former logs of this account via:
$ python ./runImport.py --s3-bucket logs-labs-cityindex-com-eu-west-1 --import-s3-folder AWS/CloudTrail/AWSLogs/860900021006/CloudTrail
That's pretty much it regarding the bulk import, except for the following kludge:
--es-server
in runImport.py?at=master#cl-48, because the parameter just didn't pick up the port as specified.Closed as Won't Fix due to project being retired to the CityIndex Attic.
AWS CloudTrail is a web service that records AWS API calls for your account and delivers log files to you:
Accordingly, a CloudTrailshipper would be a great example/test case for #165, and other than #226 it might actually be useful for (advanced) cluster monitoring purposes. Still there are a few arguments to the contrary concerning its immediate business value:
On the flipside, the required bot would have a fairly straight forward design: