debate-map / app

Monorepo for the client, server, etc. of the Debate Map website.
https://debatemap.app
MIT License
73 stars 16 forks source link

Fix pgdump-based backups to work again (even after result too long to hold in single nodejs string) #325

Closed Venryx closed 2 months ago

Venryx commented 3 months ago

Problem

The production cluster's database contents are large enough that when pg_dump is run for backing up the database, the output string is too large for the NodeJS script on the caller's computer to receive it into a single string (NodeJS has an upper limit on the size of a string).

This causes this backup route to fail, with the error: Got error during execution: Error: Cannot create a string longer than 0x1fffffe8 characters

Solution

The GraphQL API will need to be changed to allow for the pgdump's contents to be transferred from the server to the nodejs script in smaller chunks, which nodejs will then just append to the backup file it wants to create.

The rust part of the backup process is here: https://github.com/debate-map/app/blob/15163d468b922c7e43eb13c3f3347ea19656cf44/Packages/app-server/src/db/general/backups.rs The NodeJS script part is here: https://github.com/debate-map/app/blob/19a86cbe6e0de0e5bba1269f05bedb66aabdbba2/Scripts/DBBackups/GQLBackupHelper.js

Probably the easiest way to introduce "chunking" is by switching the get_db_dump from being a graphql query to instead being a graphql subscription. You can see an example of using a graphql subscription (to send data in multiple parts) here: https://github.com/debate-map/app/blob/f50a95ff38d7f95f6329baee09ae19e9912d212e/Packages/monitor-backend/src/gql/_general.rs#L476

On the NodeJS side, this will complicate the logic of course, since now instead of being a simple fetch call, there will need to be a websocket connection made, and then iterative processing of that data-stream as it appends to an output file. It's unclear to me atm whether this is able to be done with reasonable ease using native NodeJS apis, or if a library like @apollo/client will be necessary to import into the GQLBackupHelper.js file. (I'll leave that up to the you to evaluate/decide)


To start things along, I have created a new branch for working on this feature named "alvinosh/add-chunking-to-dbdump": https://github.com/debate-map/app/tree/alvinosh/add-chunking-to-dbdump

I have temporarily modified the try_get_db_dump function here to generate enough fake data so that you can replicate the NodeJS string-length limit issue.

And then for running the test backup command, run this (the nodejs script has some instructions on how to get the jwt-contents):

node ./Scripts/DBBackups/GQLBackupHelper.js backup --dev --jwt "PUT_YOUR_JWT_CONTENTS_HERE"