Open lucacasonato opened 4 years ago
There's also the possibility of passing the data straight from Firehose to MongoDB through by configuring the Firehose to use an HTTP destination and pointing to Mongo's HTTP API. It would save on the cost of the S3 bucket and Lambda execution.
In this case we would be storing each event in MongoDB though right? That seems kinda bad because of the amount of events we receive on a daily basis. I think we should have a single MongoDB document per module, (or module version or file) that tracks the download count per day.
That's a good point, I don't think Firehose allows for aggregating data on the fly. I believe Kinesis Analytics can do that, but I've never used it and I don't know how easy it would be to integrate with MongoDB.
I agree: SQS seems to be the simplest option here, there's no point in throwing data into an s3 bucket as an intermediate location.
I feel like we should setup a Terraform config to setup AWS also, open to this @lucacasonato to cover all the AWS services used. That way if you need to scale it's a case of changing something on the fly and it's code-able infrastructure then?
I would also look into: https://docs.aws.amazon.com/AmazonS3/latest/dev/analytics-storage-class.html
@narwy We don't use CloudFront or Lambda @ Edge at the minute. (Also I quite dislike Lambda @ Edge because of the unreasonable pricing and runtime limitations - I don't want to use Node or Python). I want to stick to the CloudFlare Worker we have now.
I feel like we should setup a Terraform config to setup AWS also
We have a CloudFormation config. I don't see the point of moving it to Terraform. I have had enough trouble with CloudFormation (it finally works) and don't really fancy repeating that 'fun' at the minute :-). Maybe in a month or two
Any updates on the CF analytics for this issue?
Any updates on the CF analytics for this issue?
Nothing yet.
Goal
We want to have a graph of module download counts like crates.io has. This means that we should have a list of the download counts per module (or module version, or per file) per day.
How to implement
Through discussions with @wperron on Discord we came up with two relatively simple solutions:
~I am personally more in favour of solution 1 because I feel it is relatively simple to set up (haven't used Kinesis Firehose before).~
I prefer option 3 if we have access to the Cloudflare Logpull API. You need to be an enterprise customer to make use of it though.
Decisions to make