Closed sjoshi-jpl closed 9 months ago
EN - Increase EBS volume to 60GB ATM - No change IMG - Increase EBS volume to 60GB, Increase task size to 2 vCPU, 12 GB RAM RMS - Increase task size to 1 vCPU, 8 GB GEO - Increase task size to 2 vCPU, 12 GB NAIF - No change PPI - No change PSA - No change SBNPSI - No change (for now, although it threw an alert increasing the evaluation period should help here) SBNUMD - No change
Opened DSIO #4280 for increasing EBS volume size (IMG and EN OpenSearch nodes)
@tloubrieu-jpl @jordanpadams after weighing all available options, it looks like our best bet here is to increase the volume size from 100 to 120 per node. Approval received from Jordan, will work with SA team.
That sounds good, thanks @sjoshi-jpl
DSIO-4306 created with SA team. Once completed, I'll need to revise each task definition for registry-sweeper to write to it's own log group.
All tasks completed. We have individual log groups for each node.
While testing the new registry-sweepers I am noticing that for all the nodes (domains), when the provenance script reaches the point where it's trying to write files to the db, its taking up a significant amount of FreeStorageSpace for that specific node cluster. Ex: When running IMG provenance task (1 vCPU, 8GB RAM - doesn't seem enough as it runs for longer than an hour), it brought down the FreeStorageSpace from 43 GB to 2 GB. The storage space returns to normal once the task completes.
Per discussion with @jordanpadams @tloubrieu-jpl @alexdunnjpl this is expected behavior for heavy-writes. Following are my remediation suggestions.