Turn off New Relic or reduce costs

jeffreyameyer commented 2 years ago

What's your idea for a cool feature that would help you use OHM better. If we aren't using NR on a regular basis for diagnostics or don't forsee using it in the future, let's disconnect it & save a few $$.

I'd recommend we just comment out the code stubs & turn off the account on the NR side, if that's feasible, so we can turn it on later, if necessary. If that's a lame approach, and we should just take out the embeds, that's cool, too.

danrademacher commented 2 years ago

I'm tracking this and agree it's not worth the money base don our level of use. But now that it's going, I'm seeing if we can glean a little insight for #277 before we turn it off

danrademacher commented 2 years ago

I checked today to see if we could slim down into the free tier on New Relic and then let that chug along and scale back up if we need it, but that's a hard no. I thought users were the main cost driver, but it's actually data. They provide 1 user, we have 2. Small upcharge for that and we could slim to just 1. But they also provide 100GB of data for free. Sounds like a lot, but that's less than ONE DAY of data:

Not sure how all that logging adds up to so much (isn't it text?! 4+ TB of text?) but it seems impossible to reduce data by 50X and get this into a free tier.

I think the changes to deploy this were made in OHM-deploy: https://github.com/OpenHistoricalMap/ohm-deploy/pull/87

So we'd need to revert some of those commits and redeploy to drop new relic. I don't think we should just comment out the code, since we have it all in version history and can restore from there, but if @Rub21 prefers the comment-out approach, that's fine with me.

danrademacher commented 2 years ago

cc @batpad

Pulling over from a Slack convo:

130gb a day of logging seems absurd. The 34gb for metrics also seems absurd. Wow. This is a bit wild. Thanks for flagging - lemme use my Sunday to try and process this absurdity and then we can poke around a bit tomorrow. We probably should turn NR off for now :disappointed: - is there a good way to keep things like alerts on, but not do metrics or logging? or not persist any of the logs, or something?

I don't think this issue is storing the data, I think it is the process of taking it in at all, to then put rules against it for alerting, etc. So I don't see a way to keep alerting but not ingesting.

Perhaps there's a way to pare WAY back on what we are alerting. Right now I think we're alerting on both prod and staging, so that would cut by 2X. But then we still have to cut by 25X to get into the free tier.

I suppose another way to look at it would be to git it down to an acceptable cost. If we cut data to 10GB/day from 100GB/day, then the cost of the service would be a lot less and maybe this it's more worth keeping?

danrademacher commented 2 years ago

Though if we monitor on Prod and not on Staging, then that means we're deploying different code in each place, since the deployment required a bunch of chartpress changes. That seems like a road we might not want to go down.

batpad commented 2 years ago

Though if we monitor on Prod and not on Staging, then that means we're deploying different code in each place, since the deployment required a bunch of chartpress changes. That seems like a road we might not want to go down.

This should be manageable via some configuration / environment variables to deploy the NewRelic stuff conditionally. But as you say, right now we also need to figure out a bit why we're generating 130GB of logs and try and prune that down to a more manageable amount.

@Rub21 @geohacker do you have any ideas / can we try and examine this a bit to see if we are doing really wasteful logging anywhere? I don't really want to hold back on logging things, but also 130gb / day seems just wayyy too high. Can we inspect / perhaps it's a couple of containers that we have maybe setup to be wayy too verbose and maybe just scaling down one-two things can get this down to a more realistic amount of logs?

Rub21 commented 2 years ago

@batpad @danrademacher , I have just made some changes in the code to avoid printie to many log on tiler cache cleaner, https://github.com/OpenHistoricalMap/ohm-deploy/pull/114, and push it into staging and production, We will see if that reduce the the number of logs size.

danrademacher commented 2 years ago

For the record, I'm working with New Relic support to get their nonprofit pricing in place on our account. Takes some tech support from them since we were paying regular fare and their flow is for folks starting from scratch. I'd expect that to be sorted in the next day or two, and then we can see how the nonprofit pricing and the changes in logging intersect

danrademacher commented 2 years ago

OK, conclusion is this from NR support:

if you are OK with losing the history, I think spinning up a new Free account, repointing to it, (please don’t enter your or any credit card information, to avoid the glitchy situation you saw previously), and using the TechSoup token seems like it may be the best way to go.

@jeffreyameyer can you confirm you are OK with losing history?

Since the original idea was to shut it down, I think losing history is OK and we just start again under the new account owned by GreenInfo.

danrademacher commented 2 years ago

OK, we now know the changes DevSeed made to logging had the intended effect:

Now we just need to switch over from old NR account owned by Jeff (not a nonprofit) to new NR account owned by GIN (a nonprofit) to take advantage of nonprofit pricing.

Here's what we get in the new account:

5 standard full stack observability users
1 TB per month of data ingestion
Ticketed Support

I invited Ruben to the new org as a full admin so we can eventually switch our settings over to the new account -- I assume we need to switch out some secrets and redeploy

danrademacher commented 2 years ago

We have migrated this to an NGO-owned account (GreenInfo at the moment) that is eligible for free data and 5 core users, so we get free NR from here out, within limits but our new reduced data diet over there is working fine. Ruben and Dan are confirmed users. Sanjay and Jeff have pending invites.

OpenHistoricalMap / issues

Turn off New Relic or reduce costs #333