Altinn / altinn-pdf

Altinn platform microservice for generating PDFs
0 stars 1 forks source link

Ensure that PDF-generator pods have persistent storage #57

Open SandGrainOne opened 1 year ago

SandGrainOne commented 1 year ago

Description

The concern is that we loose PDF-generator requests during auto scaling of pods. We need a way to limit the loss of requests/pdf files because of pods being shutdown, moved or restarted.

Additional Information

No response

Tasks

No response

Acceptance Criterias

No response

ivarne commented 1 year ago

As the pdf generation process in a browser is very prone to running out of memory under high load, I'd really suggest that you use some sort of queue (Azure storage queues, ServiceBus, ...) to ensure that an instance doesn't try to run too many simultanious requests, and that we have sensible error handling and retries.

annerisbakk commented 5 months ago

@SandGrainOne Oppretter ny issue med en mer generell beskrivelse, så kan denne lukkes.

SandGrainOne commented 5 months ago

@bengtfredh Could this issue have been made based on some discoveries you did when we were working on the PDF-generator?

bengtfredh commented 5 months ago

@SandGrainOne I work on a PR to change the autoscaling to use more sensible values. How it is now it scales too soon because reservations is so low. Running with low reservation, or no reservations makes pods a candidate for early rescheduling.

We may have a discussion if we should disable queuing. Now each pod accept 10 requests (can be configured) and 10 in queue. My thinking is that queue is can be problematic if pod get deleted and queue get lost. If we disable queue, app will get 429 if already 10 requests is sent, so we need to look in to how a app handle 429 in this case.

persistent storage will not help to ensure pods not beeing killed. And as far as I understand so is there no way to share queue, so persistent storage will not help to handle queue.

ivarne commented 5 months ago

If we disable queue, app will get 429 if already 10 requests is sent, so we need to look in to how a app handle 429 in this case.

The app crashes and locks the data so the user needs to create a new instance (or ask the service owner to do it for them) and do the whole process from start.