US-EPA-CAMD / easey-ui

Project Management repo for EPA Clean Air Markets Division (CAMD) Business Suite of applications
MIT License
0 stars 0 forks source link

Investigate and Mitigate DB Instance Storage Issue during Bulk File Generation & Instance Update Failure #6313

Open spetros-do opened 1 week ago

spetros-do commented 1 week ago

Background:

We have encountered a storage issue on the DB instance during the bulk file generation process. The current storage capacity is insufficient, leading to errors such as "No space left on device & Update failed - Service broker error: There was an error modifying the instance. Error" This ticket aims to investigate the root cause, optimize the process, and implement a solution to prevent future occurrences.

Michelle made a request to increase the data base storage size:

For the following RDS instance, please increase the database storage from utilizing 1 free 1-TB to an additional 1-TB, as soon as possible. Please confirm once this additional 1-TB is available, thanks.

Org: epa-easey Space: prod RDS service instance name: camd-pg-db Current Size: 750 GB

Tasks:

  1. Investigate current storage usage and identify large files.
  2. Analyze and optimize the bulk file generation process.
  3. Evaluate alternatives for generating bulk files off the DB instance.
  4. Request an increase in storage capacity if necessary.
  5. Implement and test the solution.
  6. Monitor storage usage and set up alerts.
  7. Regularly review and optimize the process.

Questions:

  1. Who has the authority to updated the database size?
  2. Can the database be configured to auto extend? (Assumed yes)
  3. Can the database server instance be configured to automatically expand the instance size?

Related Email Information

Read Sam's Email to Michelle and Jonathan

Error Message

thumbnail_2ERROR - emissions-api-could not write to file

Additional Error Information

<6> 2024-06-25T15:46:46Z ea9e0062-7a6c-48b1-a723-bf1202edc9cc doppler[29]: {"cf_app_id":"c2a45ba1-1892-47e0-b19b-e65dce705896","cf_app_name":"emissions-api","cf_org_id":"c3a0087f-449f-4024-a826-7f79b255795b","cf_org_name":"epa-easey","cf_origin":"firehose","cf_space_id":"5aade0f8-10ad-4262-9ddd-61f366b8c59c","cf_space_name":"prod","deployment":"cf-production","event_type":"LogMessage","ip":"10.10.2.22","job":"diego-cell","job_index":"569c864c-7d32-4f47-8c4b-677b7119c890","level":"info","message_type":"OUT","msg":"{\"errorId\":\"5538deb7-611e-4e14-a02e-512701752f13\",\"level\":\"error\",\"message\":\"could not write to file \\\"base/pgsql_tmp/pgsql_tmp6905.8\\\": No space left on device\",\"stack\":\"LoggingException: could not write to file \\\"base/pgsql_tmp/pgsql_tmp6905.8\\\": No space left on device\\n at HourlyApportionedEmissionsService.getEmissions (/home/vcap/app/dist/apportioned-emissions/hourly/hourly-apportioned-emissions.service.js:37:19)\\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\"}","origin":"rep","source_instance":"1","source_type":"APP/PROC/WEB","time":"2024-06-25T15:46:46Z","timestamp":1719330406191069212}

Dashboard

thumbnail_ERROR - emissions-api-could not write to file

djw4erg commented 6 days ago

From Sam's Email to Michelle and Jonathan

Digging further with the issue at hand, our immediate recommendation is as follows: To address the current storage limitations of the DB instance, we need to take immediate action to increase its size. As of this morning, the available space has been reduced to approximately 20GB, with at least 5GB already utilized. Consequently, the available space is likely to be even less. We recommend the EPA team should submit a formal request to the cloud.gov administrators to increase the storage quota for the DB instance. This request should include a consideration of any associated cost implications. We need to conduct a thorough evaluation of the bulk file generation process. The provided images indicate that the quarterly files are approximately 2GB in size. Therefore, we must determine whether generating these files on the DB server is the most efficient approach. I have created a ticket [#6313] to document and track this effort.

We need to address the failed [upgrades/updates] on the DB instance that have been occurring since December 2023. Please see details below: [included in issue: #6313]

The following information was retrieved for the service camd-pg-db in the epa-easey organization, prod space:

Last Operation Status:

Bound Applications: All bound applications have the status "create succeeded," including:

Immediate Action Required: Since upgrades are not supported by this broker, it is crucial to allocate more storage to the DB instance immediately. Please initiate the necessary steps to modify the DB instance and increase its storage capacity to resolve the current storage-full state and allow the update to proceed.

Please let me know if you have any questions or would like to discuss further efforts to mitigate this critical production issue.