cds-snc / notification-planning-core

Project planning for GC Notify Core Team
0 stars 0 forks source link

Fix QuickSight Notifications import failure #386

Open sastels opened 1 month ago

sastels commented 1 month ago

Describe the bug

The production notifications dataset in QuickSight is failing during its nightly creation with the error:

ERROR: could not write to file "base/pgsql_tmp/pgsql_tmp12782.30": No space left on device

Bug Severity

See examples in the documentation

SEV-2 Major: QuickSight will only contain notifications data older than July 16.

To Reproduce

View the nightly failures in production at https://ca-central-1.quicksight.aws.amazon.com/sn/data-sets/notifications/view

Expected behavior

Dataset created with up to date data.

Impact

CDS can no longer rely on QuickSight for data analytics related to new notifications.

Additional context

See this Slack thread.

TL;DR the joins of the full notification_history table with services, templates, etc are causing the RDS instances to run out of local storage.

Options for fixing:

  1. Increase the size of the database instances. Note that this may fix the problem currently, but as the history table grows the issue will reoccur.
  2. Move the history table (and perhaps other data) to another database / data lake with a different configuration that would allow the QuickSight import to succeed.
  3. Save notifications to the history table every night rather than as required by data retention. Then we could read just the history table into QuickSight had (hopefully) perform the joins in QuickSight / SPICE. rather than in RDS.
  4. Create a Python task to incrementally build the dataset in RDS and import this new table into QuickSight.
sastels commented 1 month ago

Roughly, we're needing an additional GB every month. Here's how low the local storage has been since we started this dataset~(~'...~'DBInstanceIdentifier~'notification-canada-ca-production-instance-0~(label~'notification-canada-ca-production-instance-0~region~'ca-central-1~visible~false))~(~'...~'notification-canada-ca-production-instance-1~(label~'notification-canada-ca-production-instance-1~region~'ca-central-1~visible~false))~(~'...~'notification-canada-ca-production-instance-2~(label~'notification-canada-ca-production-instance-2~region~'ca-central-1~visible~false)))~period~86400~region~'ca-central-1~stat~'Minimum~title~'FreeLocalStorage~yAxis~(left~(min~0))~start~'2024-04-20T043a003a00.000Z~end~'2024-07-16T033a593a59.000Z~view~'timeSeries~stacked~false))

image.png