catalyst-cooperative / pudl-usage-metrics

A dagster ETL for collecting and cleaning PUDL usage metrics.
MIT License
1 stars 0 forks source link

Ship datasette metrics from fly.io using `fly-log-shipper` #148

Open e-belfer opened 1 month ago

e-belfer commented 1 month ago

Overview

fly.io currently doesn't retain logs for a long time so we need to use the fly log shipper to send logs to S3.

We should spend at most 10 hours on this.

Success Criteria

How will we know that we're done?

We don't need to mirror the logs into GCS or ETL them into some structured format because we are likely to deprecate the Datasette shortly anyways. This lets us do some baseline analysis without too much investment.

### Next steps
- [ ] create new AWS project to plug fly logs into - our existing one is Open Data only
- [ ] create an IAM role + access key pair so that we can plug that into the log shipper configuration
jdangerx commented 3 days ago

Now that Cloud Run lets you mount buckets as volumes, it's tempting to migrate back to Cloud Run. That would look like:

And to get logs:

And if we were to stick with fly.io, the plan looks like

Which looks faster - but also, I'm suspicious that fly-log-shipper doesn't actually do what it purports to, or isn't as easy as it claims to be, since it doesn't look super actively maintained.

And, the final outcome of moving back into GCP is simpler & easier to take down if we decide to move to Superset.

So, I'm going to spend 30 minutes tomorrow trying to get fly-log-shipper working. If I run into problems I'll move back into GCP.