fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
2.69k stars 383 forks source link

Renew SCEP certificate for hosts w/ old (non-Fleet) enrollment profile #19800

Open roperzh opened 4 weeks ago

roperzh commented 4 weeks ago


User story
As an organization that automatically migrated my workstations (#19387) from my old MDM solution to Fleet,
I want to renew the SCEP certificates on my hosts
so that I know MDM features (commands, configuration profiles, etc.) will work for these hosts.


To renew SCEP certificates, we send an InstallProfile command with Fleet's enrollment profile to the devices.

Hosts that migrated using "Process for self-hosted macOS MDM migration to Fleet" (#19387), will have a different enrollment profile (one from the old MDM solution), so the InstallProfile command will fail and the SCEP certificate won't be renewed.




ℹ️  Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".


Risk assessment

Manual testing steps

  1. Step 1
  2. Step 2
  3. Step 3

Testing notes


  1. [ ] Engineer (@____): Added comment to user story confirming successful completion of QA.
  2. [ ] QA (@____): Added comment to user story confirming successful completion of QA.
dherder commented 3 weeks ago

@roperzh do we know what the SCEP certificate lifespan is for the customer devices? I do know that some MDM systems will set this to a long lived value like 2099, so in those cases it would not be an issue. If the lifespan of the certificate is short lived, I would say that this would be a P2 blocker issue.

roperzh commented 3 weeks ago

@dherder good point! we should check with them, I know that micromdm/scep uses 1 year by default (-crtvalid flag) so unless they provided a custom value there, it's 1 year

zayhanlon commented 3 weeks ago

@lukeheath @noahtalerman per the process, letting you know that we have this as workflow/migration blocking and added the p2 label. let me know if anything else needs to be done to escalate

lukeheath commented 3 weeks ago

@zayhanlon P2 makes sense to me. Our response for P2 is:

Response: Issue is prioritized at the top of the next sprint. If opporunity cost of waiting for the next sprint is too high, it may be considered for current sprint.

We'll prioritize this for next sprint, which is scheduled to ship 7/15. Is that soon enough?

@noahtalerman @georgekarrv

zayhanlon commented 3 weeks ago

@lukeheath @georgekarrv @noahtalerman - there's a thread going in #g-customer-success https://fleetdm.slack.com/archives/C062D0THVV1/p1718733547340419?thread_ts=1718384351.332159&cid=C062D0THVV1

This new issue was surfaced by Roberto this week but is also migration blocking. I don't think 7/15 will work - any way to get it faster or patched sooner? 

@zwass @dherder FYI

roperzh commented 3 weeks ago

I made it a story so it gets product feedback is that I personally only see three ways to accomplish this:

  1. We change how certificate renewals work to account for hosts with custom enrollment profiles
  2. We build some product feature that allows them/us to build a flow to renew certificates (eg: a webhook, a config in the UI)
  3. We build a script that issues cert renewals that lives outside Fleet
noahtalerman commented 3 weeks ago

Thanks @roperzh!

I threw some time on your calendar to dig into the options.

roperzh commented 3 weeks ago

we met with @noahtalerman and decided to do option 3 as a fist baby step:

We build a script that issues cert renewals that lives outside Fleet

I think this requires 3 action items:

  1. Investigate if we can use Fleet's built-in CA to issue the new SCEP certificates (@roperzh)
  2. Get required information from the customer: a. Expiration date for each device. This could be part of the export script (cc: @zwass) b. Current enrollment profile (cc: @zwass @dherder)
  3. Define where this service will be hosted, could probably live alongside the proxy? (cc: @zwass @dherder)

cc: @zayhanlon

noahtalerman commented 3 weeks ago

Thanks @roperzh!

Define where this service will be hosted, could probably live alongside the proxy?

I think we decided to go with fleetdm.com instead of standing up a separate service. Why? So we can reduce surface area and understandability for Fleet contributors.

If this doesn't work please let me know.

I think this means that the enrollment profile (XML) will live as an environment variable in Heroku. We'll probably need @eashaw's help to add that variable.

I updated the issue description to reflect this.

zwass commented 3 weeks ago

I have several questions:

  1. Who at Fleet will write the code? Who will maintain the code?
  2. Where will the server be hosted? Who will be responsible for maintaining it? Alongside the proxy does not sound like a good option as we are currently doing that in the solutions consulting AWS and the server is intended to only live for a couple weeks while the migration is completed (see https://github.com/fleetdm/fleet/issues/19387). This server seems like it needs to run indefinitely (until we make a more long-term feature for this?)
  3. How will the scripts be triggered? Is this something that the server becomes responsible for?
zwass commented 3 weeks ago

I think we decided to go with fleetdm.com instead of standing up a separate service. Why? So we can reduce surface area and understandability for Fleet contributors.

Does this mean we would be putting customer SCEP cert/keys into fleetdm.com? That sounds pretty risky to me as I'm not aware that fleetdm.com has been designed/audited for storage of customer data (let alone important customer secrets).

Or maybe we are just talking about using fleetdm.com to trigger script execution for the hosts that are expiring? That seems potentially less risky but still something that would need to be well-understood. Would it require API keys for customer Fleet servers?

noahtalerman commented 3 weeks ago

Does this mean we would be putting customer SCEP cert/keys into fleetdm.com?

@zwass I don't think so. The enrollment profile would be an environment variable in Heroku. Once the enrollment profile is delivered the host will get the new SCEP cert from the Fleet server

Would it require API keys for customer Fleet servers?

I think so yes. We need the API key to deliver the enrollment profile via the Fleet API. This can be stored an as environment variable in Heroku.

@roperzh please correct me if I'm wrong.

roperzh commented 3 weeks ago

Who at Fleet will write the code? Who will maintain the code? Where will the server be hosted? Who will be responsible for maintaining it?

who's the right person to answer this? don't want it to get lost in the convo

How will the scripts be triggered? Is this something that the server becomes responsible for?

some process needs to run at an interval and send commands, we were thinking this separate server (let's say fleetdm.com) do it

the challenge of building the functionality directly into Fleet is related to crafting the right enrollment profile, we thought that having a separate service gives us freedom to hardcode the profile to the customer's needs.

@noahtalerman maybe the profile could be provided to Fleet itself as a hidden config?

@zwass another option I just thought of: what if the proxy enqueues the command (using Fleet's API) to renew the SCEP certificate the first time it redirects a host to Fleet? this gives us 1 year to properly solve this problem.

noahtalerman commented 3 weeks ago

Who at Fleet will write the code? Who will maintain the code?

It's on the drafting board w/ the #g-mdm label. I think let's treat this as all other user stories at Fleet: bring it through estimation and into the next sprint.

Since this it sounds like the next release (2024-07-15) isn't fast enough I started a thread in #g-mdm in Slack here (internal) to chat about priority.

maybe the profile could be provided to Fleet itself as a hidden config?

@roperzh good idea. But is this because of a limitation of Heroku? If not, in order to move quickly, I think let's move forward with the current plan in the issue description.

If folks disagree, please bring jump in tomorrow's MDM design review to discuss.

Once we know what the enrollment profile will look like, we can get @eashaw's help to test. If we learn that using fleetdm.com won't work due to a Heroku limitation then I think we come back to other options.

roperzh commented 3 weeks ago

@roperzh good idea. But is this because of a limitation of Heroku? If not, in order to move quickly, I think let's move forward with the current plan in the issue description.

If folks disagree, please bring jump in tomorrow's MDM design review to discuss.

@noahtalerman sounds good! yeah, not a limitation with Heroku, but it might be simpler to run the cron in Fleet because:

  1. It's a single server we have to worry about
  2. It's very easy to add the expiration of the certs to the Fleet DB during the initial migration (vs having a separate db in fleetdm.com)
zwass commented 3 weeks ago

what if the proxy enqueues the command (using Fleet's API) to renew the SCEP certificate the first time it redirects a host to Fleet?

This seems possible. Currently there is no state maintained within the migration proxy, but state could be added.

georgekarrv commented 3 weeks ago

Hey team! Please add your planning poker estimate with Zenhub @dantecatalfamo @ghernandez345 @gillespi314 @roperzh

georgekarrv commented 3 weeks ago

Please add your planning poker estimate with Zenhub @jahzielv

JoStableford commented 3 weeks ago

Related to a Slack conversation

roperzh commented 2 weeks ago

As part of the research for this ticket I:

  1. set up a MicroMDM server behind ngrok using https://roperzh-micromdm.ngrok.io
  2. executed the database statements from the SQL generated by https://github.com/fleetdm/fleet/pull/20035
  3. changed my ngrok config to point https://roperzh-micromdm.ngrok.io to a small proxy that redirects requests from /mdm/checkin and /mdm/connect to my Fleet server /mdm/apple/mdm
  4. downloaded an enrollment profile from my Fleet server, and did the following changes:
    1. change ServerURL to be https://roperzh-micromdm.ngrok.io/mdm/connect (keep the old MicroMDM server URL)
    2. add a CheckInURL next to ServerURL with the value https://roperzh-micromdm.ngrok.io/mdm/checkin
    3. change the root PayloadIdentifier of the profile to be com.github.micromdm.micromdm.enroll
  5. sent an InstallProfile command using the enrollment profile payload

I verified that:

  1. The SCEP cert was renewed (🎉)
  2. The new certificate was issued by Fleet's CA (so the customer doesn't need to keep their old CA around)
  3. The enrollment profile in System Settings > Profiles now shows the host as enrolled by Fleet

Action items and stuff to coordinate on:

zwass commented 2 weeks ago

@roperzh are you saying you got the enrollment profile replaced without user intervention? I'm not sure I understand how this experiment is connected with the touchless migration experience we are working on with customers.

roperzh commented 2 weeks ago

@zwass sorry for not being clear. This is to renew SCEP certificates for migrated devices (which is done by re-delivering the enrollment profile)

The enrollment profile was almost replaced, but three things need to be kept in our particular case:

1. change ServerURL to be https://roperzh-micromdm.ngrok.io/mdm/connect (keep the old MicroMDM server URL)
1. add a CheckInURL next to ServerURL with the value https://roperzh-micromdm.ngrok.io/mdm/checkin
1. change the root PayloadIdentifier of the profile to be com.github.micromdm.micromdm.enroll
zwass commented 2 weeks ago

Ah, so enrollment profiles can be redelivered without user intervention as long as the ServerURL and CheckInURL don't change?

roperzh commented 2 weeks ago

@zwass exactly! in my notes I have this as the full list of things that can't change:

I think the really important findings for us are:

  1. We can switch to a diffrent CA for SCEP certs (in this case Fleet's built-in CA)
  2. We can renew SCEP certificates for migrated devices seamlessly
zayhanlon commented 2 weeks ago

@roperzh how are we doing on target ETA to get this in a patch next week? thanks :D

roperzh commented 2 weeks ago

@zayhanlon thanks for checking, still on track! but please note that the issue w/profiles is probably a bigger blocker. This is majorly a blocker for the prod deploy, the profiles is limiting their testing in staging prior to any production changes.

zayhanlon commented 2 weeks ago

@roperzh yup! i'm on it - discussing with Noah today

PezHub commented 6 hours ago

Paired w/ Roberto to test on his locally setup mircomdm server to ensure the workflow succeeded.