NASA-PDS / monitoring

Monitoring configuration for PDS EN system, currently based on AWS CloudWatch
Apache License 2.0
0 stars 0 forks source link

Develop MCP OU Structure, Cost Monitoring, and Cost Management Policies #15

Open jordanpadams opened 3 months ago

jordanpadams commented 3 months ago

💡 Description

We need to detail out what our future OU structure will be and the cost monitoring policies surrounding that.

⚔️ Sub-tasks

Other TBD tickets

jordanpadams commented 3 weeks ago

📆 05/2024 status: In work. On schedule

jordanpadams commented 1 week ago

Some discussions about how to manage budget with MWAA and different AWS accounts:

Our original plan was each Node gets there own OU Project in MCP to clearly track their budget, but then we realized if we do that, there are going to be a few wrinkles we will need to work out for our Core Data Services (e.g. Nucleus) that will be deployed within our account:


  1. We have not tested triggering a MWAA DAG from another AWS account. But if the MWAA is public, it seems it is possible to do so (https://docs.aws.amazon.com/mwaa/latest/userguide/samples-invoke-dag.html ). But I have to test this.
  2. We can have staging buckets, archiving buckets, and EFS to be in separate accounts if number 1 point above is feasible.
  3. ECS tasks should be executed in the same account that we have MWAA. However, we will not easily find which task execution was done by which PDS Node (unless we track it through task execution IDs). We will be able to find Harvest executions, since we have different harvest ECS Task per Node.
  4. If we have all DAGs in one MWAA, we will not have a DAG wise breakdown of costs. But we will be able to come up with a way to do so by,
    • Assigning a weight for each ECS task.
    • Deriving a weight for each DAG execution based on sum of ECS tasks weights.
    • Then dividing the monthly cost for PDS Nodes based on DAG execution counts and weight for each DAG execution.

If we do decided to go the separate account route, we'll need new IAM cross-account policies.


for #3, could we have a lambda in the account where MWAA will be deployed that gets triggered by an event in the ECS account? I think the event could be setup using eventbridge, the function then can read the task execution ID and fetch any relevant logs to update the workflows. The logs would be in S3.