Add option to wipe Windows hosts with doWipe

ddribeiro commented 1 month ago

Problem

customer-preston is reporting that after issuing a wipe command through Fleet, some of their Windows hosts end up in a non-bootable state and Windows needs to be re-installed.

They believe this is because Fleet is sending the doWipeProtected Windows CSP and that sending using doWipe instead would prevent this behavior.

What have you tried?

The customer would be able to build their own Windows CSP that uses doWipe instead of doWipeProtected send it to the Fleet API.

However, there are benefits they get by using the native Fleet behavior, including:

Calling a single API endpoint that works for all platforms.
Device lock state reporting in Fleet

Potential solutions

When issuing a wipe command using Fleet's native functionality, Fleet could support the ability for an admin to specify what kind of wipe command is issue. To solve this problem, Fleet could offer the option to send a doWipe or doWipeProtected command.

What is the expected workflow as a result of your proposal?

When customer-preston refreshes a Windows device to issue to a new user, they use Fleet's wipe command to erase the pervious user's data. In this workflow, they would:

Call the Fleet API to wipe the device
In the API call, they would specify they want to use a doWipe instead of the default doWipeProtected behavior.
The device would receive the command and wipe successfully. If the process is interrupted, the computer would not result in an non-bootable state like what appears to happen with a doWipeProtected command.

Because this workflow is not a stolen/lost device situation, doWipeProtected is not required.

ddribeiro commented 1 month ago

Additional notes: Microsoft's RemoteWipe CSP Documentation

It is unclear from Microsoft's documentation whether a regular doWipe command will actually prevent the device from being non-bootable. From the doWipe CSP description:

If a doWipe reset is started and then interrupted, the PC will attempt to roll-back to the pre-reset state. If the PC can't be rolled-back, the recovery environment will take no additional actions and the PC could be in an unusable state and Windows will have to be reinstalled.

From the doWipeProtected description:

The doWipeProtected is functionally similar to doWipe. But unlike doWipe, which can be easily circumvented by simply power cycling the device, doWipeProtected will keep trying to reset the device until it's done.

I think "functionally similar" is an interesting choice of words. My interpretation is that both commands perform the same action under the hood, but doWipeProtected will continuously attempt to reset the device if it is interrupted. This is probably something we need to dive into a little deeper before we decide to pick this up.

cc: @nonpunctual

nonpunctual commented 1 month ago

My interpretation of the doWipeProtected command in the Microsoft docs is that it is intended to be used only in the case that a device is irretrievably lost (e.g., in a river, smashed in traffic, stolen).

Fleet should be using doWipeProtected for this. The intention of the Fleet wipe feature is: admins to have a way to protect an asset belonging to an organization that is irretrievably lost.

customer-preston is not using the feature this way. They are using the Fleet wipe feature to repurpose devices for MSP customers. This is a valid use case, but, not aligned with the feature as is.

My opinion here is that if we do anything, we should add a feature for device repurposing, or, instruct the customer that the feature we've deployed is not intended for device repurposing & they could create their own solution for this.

noahtalerman commented 1 month ago

Before we dedicate any design/eng resources let's understand if the DoWipe CSP will solve the customer's need: reset the device w/o having to re-install Windows.

How?

Send DoWipe 10 times. How many times was the device in an non-bootable state?
- What the end end user experience? Can the end user cancel it? Can they cancel it w/ Reboot?
Send DoWipeProtected 10 times. How many times was the device in an non-bootable state?

If DoWipe performs better then consider building a "Reset" option.

noahtalerman commented 1 month ago

Dave: https://learn.microsoft.com/en-us/answers/questions/247954/wipe-action-resulting-in-recovery-failure-on-windo

nonpunctual commented 1 month ago

@noahtalerman @ddribeiro @dherder What I actually think is more troubling is that doWipeProtected seems like it should completely erase the computer every time, no matter what. The fact that it's being reported that it only deletes the computer sometimes (what's actually being reported is that sometimes it DOES NOT wipe the computer, which is actually what I think the customer wants ie they do not want to have to reinstall Windows...) is an issue for Miscorsoft imo.

ddribeiro commented 1 month ago

@nonpunctual We should clarify, when we say "fail" (in Noah's comment) we mean "the device wipes but results in an non-bootable state." From what I understand the device always wipes successfully when a wipe command is issued from Fleet.

noahtalerman commented 1 month ago

@ddribeiro and @nonpunctual thank you both!

I updated my comment here to clarify "failed":

How many times was the device in an non-bootable state?

noahtalerman commented 1 month ago

Hey @georgekarrv and @lukeheath do you think we could get some QA/engineering help to do the testing outlined in this comment?

It will help us understand the problem so we can come up with the best solution.

I think @ddribeiro can help guide who ever ends up doing the testing.

lukeheath commented 1 month ago

@noahtalerman I'm not sure if there is any immediate capacity, but this could be estimated and brought into the sprint as a timebox item.

noahtalerman commented 1 month ago

@zayhanlon and I decided to pull this one out of the current design sprint and prioritize the following request instead:

22028

valentinpezon-primo commented 1 month ago

Found this here :

It seems that using the doWipeProtected on encrypted device make it so the device is unbootable again (which is the exact problem we have) it's easy to test if you have some windows testing material

Here is the different test scenarios I see :

Use doWipeProtected on encryped device:

encrypt device
use doWipeProtected
confirm the behavior

Use doWipe on encryped device:

encrypt device
use doWipe
confirm the behavior

Use doWipeProtected on un-encrypted device:

remove device encryption
use doWipeProtected
confirm the behavior

Use doWipe on un-encrypted device:

remove device encryption
use doWipe
confirm the behavior

JoStableford commented 1 month ago

Linked to Unthread ticket:

Feature Requests and Issues Recap #2838)

fleetdm / fleet