fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
3.03k stars 419 forks source link

Sharp edges cleanup for batch software upload #22300

Open iansltx opened 2 weeks ago

iansltx commented 2 weeks ago

Goal

User story
As a GitOps user,
I want to have a more efficient batch software upload experience
so that I can more cleanly interact with GitOps for software uploads.

Context

What else should contributors keep in mind when working on this change? (Optional.)

This should just be an implementation (backend and maybe fleetctl) and docs change, unless we want to get fancy with exposing software update batch runs to users.

Items are:

  1. Switching to 202 from 200 on POST software/batch
  2. Allowing cancellation of software batches while they are in progress (e.g. while downloading packages from external sources)
  3. Potentially removing software installer downloads as part of the dry run, or at least using HEAD methods for file downloads to avoid pulling an entire installer 2x per place it's defined.

Changes

Product

Engineering

  1. Switch 200 to 202 on software batch endpoint POST, since we're performing background processing
  2. Allow cancellation of in-progress software upload batch operations, at least by request UUID and potentially across all runs. The former case would be useful for cancelling out of a stuck GitOps run. The latter would be useful for proactively running before a GitOps run starts to ensure consistent state.

For the single-run cancel endpoint, we could use DELETE software_batch/{uuid}, and for cancelling all WIP runs we could use DELETE software_batch/processing.

The easiest way to do bulk cancel would be using KEYS on Redis, but that would introduce load issues elsewhere as Redis would need to check every key in its database. To allow for efficient bulk cancel we'll need to switch how we store progress in Redis. This is a bit more of an undertaking so is probably a different task than the initial "cancel by UUID", and the bulk cancel would build on the single cancel work, which I believe wouldn't need to touch how we store things in Redis.

ℹ️  Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

QA

Risk assessment

Manual testing steps

TODO (will all be tested via GitOps)

Testing notes

Confirmation

  1. [ ] Engineer (@____): Added comment to user story confirming successful completion of QA.
  2. [ ] QA (@____): Added comment to user story confirming successful completion of QA.
sharon-fdm commented 2 weeks ago

cc: @noahtalerman, @lukeheath for prioritization.

noahtalerman commented 2 weeks ago

Thanks Sharon! I pulled this eng initiated story off of the the drafting board (removed :product).

It only needs ~engineering-initiated to get into Luke's queue of engineering initiated stories.

getvictor commented 2 weeks ago

@lukeheath This part of the issue appears to be a bug:

Potentially removing software installer downloads as part of the dry run, or at least using HEAD methods for file downloads to avoid pulling an entire installer 2x per place it's defined.

If a customer has a bunch of software installers, not only will GitOps take a long time, but it will also load the servers. This is a performance issue for customers.

We should not be downloading software at all. We should use a hash to determine if software changed from what Fleet knows about already.

getvictor commented 2 weeks ago

Semi-related issue regarding use of ETags for caching: https://github.com/fleetdm/fleet/issues/17697

lukeheath commented 2 weeks ago

@getvictor @iansltx Should the bug portion be filed separately as a bug?

iansltx commented 2 weeks ago

Yeah, we should split that part. This was a "get thoughts into GitHub" issue.