fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
3.02k stars 419 forks source link

Automatically renew host SCEP certificates before expiration #15332

Closed roperzh closed 7 months ago

roperzh commented 10 months ago

Goal

User story
As an IT admin,
I want Fleet to automatically renew the SCEP certificates installed on my hosts
so that my SCEP certificates never expire I don't have to turn on MDM again for macOS hosts.

Changes

Product

Engineering

Context: the Fleet server acts as a CA and delivers issues certificates to devices during MDM enrollment using the SCEP protocol.

The certificate issued to the device has a validity period defined via the mdm.apple_scep_signer_validity_days server config.

After the certificate expires, the server is not able to authenticate the client anymore. See this if you're interested in the details.

[!WARNING] presenting two possible options, we need to choose one before implementing.

Option A: middleware (click to expand) - [ ] Add a middleware here, that is called _after_ `httpmdm.CertExtractMdmSignatureMiddleware` `httpmdm.CertVerifyMiddleware` https://github.com/fleetdm/fleet/blob/cf7b2e9903477a4522602a95594aabf76a67fada/server/service/handler.go#L992-L995 - [ ] In the middleware, you have access to the certificate from the context: https://github.com/fleetdm/fleet/blob/cf7b2e9903477a4522602a95594aabf76a67fada/server/mdm/nanomdm/http/mdm/mdm_cert.go#L101-L106 - [ ] If the certificate expires in 30 days, send an `InstallProfile` command with an enrollment profile generated by `apple_mdm.GenerateEnrollmentProfileMobileconfig`. Consider if it's worth to add a special method to `apple_mdm.Commander` for the enrollment profile, or just using `Commander.InstallProfile` is good enough. - [ ] Database schema migrations: Not required - [ ] Load testing: Not required
Option B: cron job (click to expand) - [ ] In a cron job, look at certificates that expire in 30 days using the `scep_certificates` table. - [ ] For each cert, calculate its checksum like this: https://github.com/fleetdm/fleet/blob/cf7b2e9903477a4522602a95594aabf76a67fada/server/mdm/nanomdm/service/certauth/certauth.go#L100-L105 - [ ] Look for matching hosts using `nano_cert_auth_associations` - [ ] `nano_cert_auth_associations.id` is the UUID of the host (`hosts.uuid`) - [ ] `nano_cert_auth_associations.sha256` is the checksum of the cert - [ ] Send an `InstallProfile` command with an enrollment profile generated by `apple_mdm.GenerateEnrollmentProfileMobileconfig`. Consider if it's worth to add a special method to `apple_mdm.Commander` for the enrollment profile, or just using `Commander.InstallProfile` is good enough. - [ ] Consider if adding indexes and/or pre-computing the sha256 of each certificate might be desirable - [ ] Database schema migrations: To be defined - [ ] Load testing: Not required

ℹ️  Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

Context

QA

Risk assessment

Manual testing steps

  1. Configure a value > 30 days for mdm.apple_scep_signer_validity_days when you start your server.
  2. Turn on MDM features for a new macOS host.
  3. Trigger the cleanups_then_aggregation job, which should enqueue a cert renewal
  4. Verify that the cert is renewed. You can do this by searching for the "Fleet Identity" certificate in Keychain
  5. As long as mdm.apple_scep_signer_validity_days is > 30, we'll renew the cert on each cron run. To stop this process, restart the server without the setting set (defaults to 1 year), run the cron again, and verify that the cert issued is for 1 year.

Additional testing

  1. Do all the steps above, but this time:
    1. Enable MDM SSO
    2. Enroll a host via ADE
  2. After a renewal, check if the enrollment profile still has a query parameter named enrollment_reference

Testing notes

Confirmation

  1. [ ] Engineer (@____): Added comment to user story confirming successful completion of QA.
  2. [ ] QA (@____): Added comment to user story confirming successful completion of QA.
noahtalerman commented 8 months ago

Before the certificate expires, automatically issue an InstallProfile command with an enrollment profile

@roperzh just curious, what happens if the SCEP cert is already expired? Can we not renew the cert w/ an InstallProfile command?

roperzh commented 8 months ago

@roperzh just curious, what happens if the SCEP cert is already expired? Can we not renew the cert w/ an InstallProfile command?

@noahtalerman we can't because we can't send any MDM commands at all! (as we're not able to authenticate the device)

noahtalerman commented 8 months ago

@marko-lisica heads up, I think we want to prioritize this story above #16335 and #11544.

Why? I just realized we have two MDM customers that participated in a beta for macOS MDM features (started in March 2023): customer-zabinski and customer-clara.

customer-zabinski turned on MDM features for some hosts on 2023-03-07

customer-clara turned on MDM features for some hosts on 2023-03-01

I think this means the SCEP certs for these hosts will expire on 2023-03-07 and 2023-03-01 respectively. @roperzh does that sound right to you?

So, I think we're going to want to ship this as part of the first patch release next sprint (Fleet v.4.45.1) which falls on 2024-02-26.

roperzh commented 8 months ago

I think this means the SCEP certs for these hosts will expire on 2023-03-07 and 2023-03-01 respectively. @roperzh does that sound right to you?

@noahtalerman the exact date will depend on when the first host turned on MDM features, but yeah, that sounds correct.

noahtalerman commented 8 months ago

@roperzh FYI I moved the original issue description here:

Expected behavior: Fleet automatically attempts to renew SCEP cert 30 days before expiration. If renewal fails, Fleet logs an error and tries again the next day.

Problem

As part of the SCEP protocol, each device owns a certificate that's used for authentication. These certificates have a default expiration date of one year (can be configured using this setting)

Hosts with expired certificates can't communicate with the MDM server.

For more context, see https://github.com/fleetdm/confidential/issues/4518

Potential solutions

  1. Before the certificate expires, automatically issue an InstallProfile command with an enrollment profile. This causes the SCEP certificate to be renewed.
noahtalerman commented 8 months ago

Here's the separate story for adding an activity item for SCEP cert renewal: https://github.com/fleetdm/fleet/issues/16671

noahtalerman commented 8 months ago

@georgekarrv heads up, moving this story to "Settled."

We want to ship this as part of the 4.45.1 patch (targeted on 2024-02-26) so that our earliest adopters don't have certs that expire. More context here.

This means we'll give the story a lightspeed label and make an exception to release a feature during a patch.

cc @lukeheath

lukeheath commented 8 months ago

@noahtalerman

This means we'll give the story a lightspeed label and make an exception to release a feature during a patch.

We can't release features as part of a patch release because it violates semantic versioning:

"MAJOR version when you make incompatible API changes MINOR version when you add functionality in a backward compatible manner PATCH version when you make backward compatible bug fixes"

The distinction comes down to whether this fixes a defect or adds functionality.

If it fixes a defect, we should re-label it as a bug, and we can release it as part of the patch.

If it adds functionality, we need to issue a new minor version. We don't have to wait until the next scheduled minor release, but we would need to introduce an unscheduled minor version bump.

The next scheduled minor version release is v4.45.0 on 02/19, which is well before the first expiration on 03/01, so sticking to our normal schedule seems like the best course.

lukeheath commented 8 months ago

Oh, I see, this isn't coming into the sprint until v4.45.0 so that doesn't work.

In that case, if needed we'll have to release a mid-sprint minor release. Alternatively, if we host the environments we could look into renewing the SCEPs manually.

noahtalerman commented 8 months ago

Re semver: makes sense. Thanks Luke.

if needed we'll have to release a mid-sprint minor release. Alternatively, if we host the environments we could look into renewing the SCEPs manually.

I think we're hosting customer-clara. I think customer-zabinski is self-hosted.

I think releasing a mid-sprint minor release would be a better experience for both customers.

We only have to notify these customers because they're the only customers approaching cert expiration.

I scheduled a call for you, I, @pintomi1989, and @Patagonia121 to align/discuss.

Patagonia121 commented 8 months ago

Sounds good @noahtalerman and @lukeheath - I think I understand the ask and I already posted in the appropriate channels. I'll confirm with @pintomi1989 tomorrow. We're happy to hop on a call to go through any additional context, if any. Thanks!

georgekarrv commented 7 months ago

Hey team! Please add your planning poker estimate with Zenhub @ghernandez345 @gillespi314 @mna @roperzh

sabrinabuckets commented 7 months ago

Completed manual testing steps with both manual and automatic enrollment.

rachaelshaw commented 7 months ago

@roperzh did we go with option A (middleware) or option B (cron job)?

fleet-release commented 7 months ago

Renewal automatic, In cloud city, no panic, Fleet's magic, no static.