alphagov / pay-toolbox

Internal administrative tools service for GOV.UK Pay products.
MIT License
11 stars 6 forks source link

Proposal to separate the live payments dashboard from the Toolbox repository and deployment (draft, optional Firebreak candidate) #1819

Open sfount opened 2 weeks ago

sfount commented 2 weeks ago
Screenshot 2024-06-20 at 12 57 56

The live payments dashboard was added to Toolbox during a Firebreak in 2019. The dashboard has been used internally to highlight that alongside any current day challenges and work in progress, the platform runs smoothly and is relied on by services across government.

At the time bundling the dashboard into internal tooling was justified (as explained further down) and allowed it to be deployed quickly.

Separating the dashboard code and hosting would address a number of pain points:

Additional operational implications:

Context

The dashboard was originally designed for use in the office, running on small TV screens in the GOV.UK Pay team area. This meant that:

Proposed method

Previous proposals have been to:

Current architecture

Dashboard fetches data in real time from Pay services

sequenceDiagram
    Dashboard frontend (Toolbox) -> Toolbox server: requests information on services
    Toolbox server -> Adminusers microservice: requests information on services
    Dashboard frontend (Toolbox) -> Toolbox server: requests aggregate stats for the day so far
    Toolbox server -> Ledger microservice: requests aggregate stats for the day so far
    loop Every 5 seconds
        Dashboard frontend (Toolbox) -> Toolbox server: requests events data for the last few seconds
        Toolbox server -> Ledger microservice: requests events data for the last few seconds
    end

Proposed architecture

Dashboard fetches data from S3

sequenceDiagram
    Dashboard frontend (GitHub Pages) -> S3: Fetch information on services and aggregate stats
    Loop Every 10 minutes
        Dashboard frontend (GitHub Pages) -> S3: Fetch event information on payments
    end

Independently a scheduled task fetches data from Pay services and stores it in the appropriate format for S3

sequenceDiagram
    loop Once at the end of the day
        Scheduled lambda -> Ledger microservice: Get up to date information about payment events
        Scheduled lambda -> Adminusers microservce: Get up to date information about services
        Scheduled lambda ->> S3: Write information to bucket
    end
katstevens commented 2 weeks ago

I'm assuming this proposal means Pay still owns the artefacts here (repo, lambda, S3 bucket, IAM policies to allow everything to happen)?

katstevens commented 2 weeks ago

We'd also have make sure we only store essential payment metadata in the S3 bucket (amount, service, status, payment type - no personal data or reference numbers). Don't know if the current API endpoint Toolbox uses returns anything else. We need to remove any risk of incidental pollution that could end up in the bucket.

sfount commented 2 weeks ago

Yeah Pay would own those.

I suppose one way of looking at it is shifting the complexity from internal real-time requests over to ETL infrastructure to put just the right amount data in the right place at the right time. (So less code to maintain in Toolbox, and less things hitting internal microservices, but more infrastructure)

Agree what ends up in the bucket would be a big part of if it could work or not. At the moment the only thing Toolbox gets is service (UUID), status, amount, time but as it has always been IP locked doing some threat modelling on whats stored there would be needed.

Maybe one way of approaching that would be to think through what would be needed to no longer rely on internal API calls but continue with the IP restrictions until the threat modelling had been done.

Beth-Brown commented 2 weeks ago

I'd really like to see the accessibility issue with the speed on updates / flashing green completed payments resolved as part of this work.

I love this proposal - its a great way for us to be able to use the payments dashboard in external events or in product demos more easily in future.

Beth-Brown commented 2 weeks ago

I'd like to see this work broken up into sizeable chunks that we can prioritise when required. This project would be much bigger than one firebreak week right now, so breaking it down into some valuable deliverables will really help us to achieve incremental progress over time and manage our expectations over how long this work is likely to take