fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
2.91k stars 405 forks source link

Show asyncronous errors from Apple's VPP API #20449

Open mna opened 1 month ago

mna commented 1 month ago

Goal

User story
As an IT admin,
I want to be notified of errors assigning VPP software to hosts
so that I can take necessary actions to remediate

Context

The VPP API returns some errors asynchronously (think of a webhook), the Fleet server must subscribe to get notified, and surface the error to the user later on.

Notifications might happen a while after the server performs the API call to Apple's server.

The current implementation doesn't handle any of these errors.

Changes

Product

Engineering

ℹ️  Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

QA

Risk assessment

Manual testing steps

  1. Step 1
  2. Step 2
  3. Step 3

Testing notes

Confirmation

  1. [ ] Engineer (@____): Added comment to user story confirming successful completion of QA.
  2. [ ] QA (@____): Added comment to user story confirming successful completion of QA.
mna commented 1 month ago

@georgekarrv @marko-lisica heads-up, this might require product review to determine how to surface those async errors.

jahzielv commented 1 month ago

Note: during some manual testing of Apple's APIs, I found that you can pass a nonsense serial number to POST https://vpp.itunes.apple.com/mdm/v2/assets/associate and it will return a 200. Moreover, getting the association event status via GET https://vpp.itunes.apple.com/mdm/v2/status shows that the association completed without issue. This means that we won't see an error in our installation flow until we get the result from the MDM installation command.

marko-lisica commented 1 month ago

Note: during some manual testing of Apple's APIs, I found that you can pass a nonsense serial number to POST https://vpp.itunes.apple.com/mdm/v2/assets/associate and it will return a 200. Moreover, getting the association event status via GET https://vpp.itunes.apple.com/mdm/v2/status shows that the association completed without issue. This means that we won't see an error in our installation flow until we get the result from the MDM installation command.

@jahzielv Fleet is responsible for associating the VPP app to a device, so there's no way to send the wrong serial number?

jahzielv commented 1 month ago

@marko-lisica yeah I think that this is a pretty edge case failure mode, but wanted to call out that we couldn't 100% rely on Apple's APIs to return the errors we care about

I ran into this while trying to install VPP apps on a MacOS VM (failed because the serial number isn't real).

marko-lisica commented 1 month ago

I ran into this while trying to install VPP apps on a MacOS VM (failed because the serial number isn't real).

Could you describe what exactly happened? Were you able to associate the asset with Apple's API and then after running MDM command to install app it failed or?

roperzh commented 1 month ago

As discussed in stand-up, I'm converting this issue into a story so we can design how to surface the async errors to users appropriately.

noahtalerman commented 1 month ago

Hey @roperzh, have we run into one of these async errors during implementation/QA? If so which ones?

Because I'm not sure what these errors are and when they happen, I'm not super clear on where we'd surface them to the IT admin. @marko-lisica might have a better understanding than me.

roperzh commented 1 month ago

hey @noahtalerman the list of all possible errors is in the issue description (search for "Asynchronous Failures" in that page)

I think the only one we're worried about is:

There aren't enough assets available to complete this association.

Note that we perform checks to prevent this, but we can't guarantee the request won't error because requests can happen in parallel.

We discussed with Marko that as of today you'll get an error message in the command result, so we're relying on that.

noahtalerman commented 1 month ago

I think the only one we're worried about is:

There aren't enough assets available to complete this association.

as of today you'll get an error message in the command result, so we're relying on that.

@marko-lisica do you know if the IT admin will see this error message when they try to install a specific app? Or does this error mean the IT admin will see an error message for every app they try to install?

I think if it's per-app then totally makes sense to rely on command result.

If it's "VPP is broken and I'm always going to get an error for all apps before I fix VPP" then I think we want to prioritize this sooner rather than later.

marko-lisica commented 1 month ago

@noahtalerman I had a chat with Roberto, and here's the summary, but we can talk on Monday and decide how to tackle this.

I think we should handle these async requests before we get to the automatic installation of VPP apps (and scoping via labels).

Why?

If we use notifications it will look something like this:

We'll need this for uninstall feature as well. An additional thing that will be easier to implement later is license count on the software title page, since we can get notifications when the count change.

I was considering if it would make sense to work on this together with the automatic install feature for VPP, but speaking with @roperzh we agreed that it would be better to work on this separately. (rough estimate 5 to 8).

noahtalerman commented 1 month ago

I think we should handle these async requests before we get to the automatic installation of VPP apps (and scoping via labels).

@marko-lisica makes sense. Do we have a feature request tracked for automatically installing App Store apps? If not, can you please track one?

I think we'll want to address that story in a quick follow up to the following story:

marko-lisica commented 1 month ago

@noahtalerman I have created feature requests for App Store (VPP) apps: