awslabs / lambda-streams-to-firehose

AWS Lambda function to forward Stream data to Kinesis Firehose
Apache License 2.0
279 stars 93 forks source link

Problem with a Cloudwatch Logs Destination to Kinesis stream? #35

Open jpb-Cloudy-McCloudFace opened 7 years ago

jpb-Cloudy-McCloudFace commented 7 years ago

I am using a CWL Destination sending to a Stream and then using your Lambda to send it on to Firehose - S3, no Firehose compression or encryption. The files that show up look like some strange unicode format. Is there an issue with using a Destination as a CWL subscription to Kinesis stream here?

Example file contents in S3:

"\u001f�\b\u0000\u0000\u0000\u0000\u0000\u0000\u00005��\n�@\u0014Ee�uD���\u000b1\u0017YB\n-\"bҗ>�\u0019�7\u0016\u0011�{c��p/��7��H\u0014��\u001a�\u001e��}z���.H�u\u0018�\u0011WO\t�O*��Oa�2R\u0005٠RE�U��l��h\u0010���^)��\u0018Tr��\u0001M�;����\u0001�����\u000fu�VÈڎM\u001dw>sW�r�p��_�\u00178F���~z\u001e�K��(\u000bV��L�ԍ�v\t����\u0016%\u0010\u0012��ژw��\u0003_��\u0003�\u0000\u0000\u0000"
IanMeyers commented 7 years ago

If memory serves, CloudWatch Log Streams are Gzipped. You will need to create an instance of the transformer.js function which decompresses the data and outputs it as UTF-8 text.

hurleyit commented 7 years ago

I'm working on this pretty heavily at the moment. I think beyond figuring out how to write a custom transformer (I'm new to node but I've figured it out), I ran into a problem with this line:

var dataItem = serviceName === KINESIS_SERVICE_NAME ? new Buffer(userRecord.data, 'base64').toString(targetEncoding) : userRecord;

I think converting it to the targetEncoding (here utf8) before unzipping causes issues. Since most of the transformers also attempt to do the encoding convert, might it be possible to remove this one?

Would you be open to having a transformer in the code base for CloudWatch logs since I'm guessing it will be a fairly popular use case?

IanMeyers commented 7 years ago

You are correct - since the input data for CWL is gzipped, this will fail. Happy to take a PR and a transformer just for CWL makes perfect sense.

Nascentes commented 7 years ago

@djjesseb : I have recently done this exact configuration for an environment. +1 to @IanMeyers comment WRT to gzipped source files. Here is a resource that helped me exponentially with understanding the data flow so that I could adopt it to suit my needs:

https://mike.lapidak.is/thoughts/exporting-cloudwatch-logs-to-s3-lambda