Optum / dce

Disposable Cloud Environment
Apache License 2.0
314 stars 82 forks source link

CloudWatch - logs from lambda/update_lease_status reporting incorrect cost #340

Open rafabnunes opened 4 years ago

rafabnunes commented 4 years ago

Version information

Describe the bug When I receive the notification budget I see a cost different from AWS billing. According to information bellow from CloudWatch logs, the user has spent $9.03, but I saw in the AWS Billing a value above. My question is, DCE can get all information regarding the cost from AWS Billing ? I've deployed in the lease account the services such as EC2 and VPC Endpoints.

Another scenario, when leased account achieve the budget limit, DCE would do reset account, but according to the information bellow from CloudWatch logs, account has achieved the limit but does not performed an account reset.

-"2020/04/15 15:39:27 Principal dcepocuser has spent $14.44 of their current principal budget

I also checked the logs from code build, but there's no information about reset from any account in the date mentioned above.

How can I fix this issue ?

eschwartz commented 4 years ago

Hi @rafabnunes -- we're aware of some issues in the way cost/usage are being reported in the system today. We're actually in the middle of a big overhaul of cost reporting (see #316), that will hopefully address any issues we're seeing.

I apologize that we haven't better documented some of the known buggy behavior here. I'm still trying to get a handle on all of the moving parts myself. But I intend to document the current situation in more detail for posterity.

Note that we're probably looking at some significant breaking changes with PR #316 (eg. it will destroy and recreate the usage DB table), so if this is a feature you're relying on, it may be worthwhile to wait for that release to land before investing too heavily in your current setup.

rafabnunes commented 4 years ago

Hey @eschwartz thanks for all information.

According to PR #316 it seems like will solve this issue.

My current setup is proof of concept, I might wait a little bit for the new version.

eschwartz commented 4 years ago

Sure @rafabnunes , I think that's a wise path.

I will update this ticket when the PR is merged and released. Hopefully it's won't be too long here now....

rafabnunes commented 4 years ago

Hey @eschwartz

Just to make sure that you mentioned before, the issue bellow will be fixed with DCE new version ?

"When leased account achieve the budget limit, DCE would do reset account, but according to the information bellow from CloudWatch logs, account has achieved the limit but does not performed an account reset.

-"2020/04/15 15:39:27 Principal dcepocuser has spent $14.44 of their current principal budget

2020/04/15 15:39:27 OverBudget. Updating lease as ready to be reclaimed... I also checked the logs from code build, but there's no information about reset from any account in the date mentioned above."

rafabnunes commented 4 years ago

Hi @eschwartz

I did a new test leasing a new account. I've deployed a few resources like EC2 and VPC Endpoints, after few hours the budget has exceeded, the account was changed to "OverBudget" but does not performed the reset process by code build. It seems like Lambda "Update lease status" does not push the event "OverBudget" to SQS "Reset queue".

To enforce the cleanup in the account, I've changed manually the account status to "NotReady" at DynamoDB. After that, the Lambda "Process reset queue" was able to trigger codebuild "Reset AWS codebuild" and hence the account was cleaned and came back to the pool with status "Ready".

Since I haven't found any errors at CloudWatch logs. Do you have any idea to fix this issue ?

Screen Shot 2020-04-24 at 10 18 18 Screen Shot 2020-04-24 at 12 39 52
rafabnunes commented 4 years ago

Hey @eschwartz I noticed that was released the version 0.30.1. I've already deployed, but I continue facing the issue. When an account become "OverBudget", the process to reset or cleanup account is not performed. Do you have any idea or tip to solve this issue ?

Screen Shot 2020-04-28 at 14 18 47
eschwartz commented 4 years ago

@rafabnunes I want to let you know that I'm moving off the DCE team at Optum this week, so I want to pass this PR off @robologic to shepherd through.

@robologic can you take the ball on this one, please? I'm hoping this will all be resolved by #302 , but it's worth a follow-up

rafabnunes commented 4 years ago

@eschwartz thanks for everything and good luck !

@robologic it seems like the lambda "Update Leases" after update DynamoDB table, unable to send information from account with status "OverBudget" to SQS.

rafabnunes commented 4 years ago

Hi @robologic,

I've uninstalled DCE version 0.30.1 and installed the version according to Feature/usage2.0 #302. After some tests with leased accounts, the lambda "end_over_budget_lease" was performed and triggered SQS and hence the account was cleaned and came back to the pool with status "Ready".

Thanks a lot.

Rafael Nunes