cloud-gov / pages-core

cloud.gov Pages is a publishing platform for modern 21st Century IDEA websites.
https://cloud.gov/pages
Other
275 stars 68 forks source link

Consider what branches/builds or intervals to run the scans on -- and whether automated or upon request #4530

Closed sknep closed 2 months ago

sknep commented 2 months ago

thinking about running build scans:

implications for cost, capacity, and user experience

### Tasks
- [x] meeting to discuss implications for capacity, cost, ux

Acceptability Criteria

sknep commented 2 months ago

meeting scheduled for tomorrow

sknep commented 2 months ago

Notes here: https://docs.google.com/document/d/1SXAwuf2X6kpVdlLG5oOpheGX-k9ljm0cY3aYn8DVCGg/edit TL;DR:

  1. move to automated monthly scans on all preview builds (all branches) regardless of time since last push (paid, but not sandbox sites/orgs)
  2. disperse these automated scans throughout the month through the overnight queues
  3. Also allow on-demand scans to be kicked off by all users (paid and sandbox), with no limit/timeout/debounce
  4. need a “scan history” page - and a way of associating one-or-more scans to individual builds
  5. set up a scan deletion policy of 6 mos to match logs retention policy
  6. follow up about ignore config settings tomorrow w/ Kudeha
drewbo commented 2 months ago

@sknep thank you for arranging, noting, and summarizing. One question or clarifying point, should we run a single monthly scan (per type) on the latest build for each site? So "any" branch rather than all branches?

sknep commented 2 months ago

if a common use case is someone going in and grabbing a monthly scan for their prod site once a month... then it's possible that the site's latest build could be on a content or dev branch if some work is still in progress when the interval comes around. I like not generating a bunch of unnecessary scans, but I also can't be sure that "latest" = "most useful". And not every site as a custom domain tied to a branch to indicate that it might be most relevant... This is more complex, but what if:

svenaas commented 2 months ago

I think it might be best to scan all branches with custom domains — paid platform access includes three domains (partners can upgrade and get more for an additional subscription fee) and I'm not sure there's a consistent heuristic for which ones should be considered production. Even beta dot domain dot gov or demo dot domain dot gov is likely to be key work in progress, and is publicly visible.

drewbo commented 2 months ago

I think to start @sknep's outline makes the most sense: Monthly schedule scans of the production domain (this is explicitly specified in our database, at least for now via context) or the latest preview if no domain. The thing we are losing a bit with this is the "frequent scan history" and "make sure it's good prior to production" and hopefully the manual scans meet those needs for now. As we expand this, maybe we add configurability to scan additional branches/domains on a scheduled or ongoing basis