fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
3.15k stars 431 forks source link

Log errors during migration workflow #13189

Closed noahtalerman closed 1 year ago

noahtalerman commented 1 year ago

Goal

User story
As an IT admin,
I want to see all errors that occur during the end user migration workflow
so that I can reach out to IT support if an end user encounters an error.

Requirements

Changes

Engineering

Product quality

ℹ️  Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

Context

QA

Risk assessment

Manual testing steps

  1. Using orbit and fleet desktop built from main with a local TUF server
  2. Intentionally make the migration flow fail (by being disconnected from the network, making the webhook fail, etc)
  3. Check that a 500 error was generated in the server logs

Testing notes

Confirmation

  1. [ ] Engineer (@____): Added comment to user story confirming succesful completion of QA.
  2. [ ] QA (@____): Added comment to user story confirming succesful completion of QA.
sabrinabuckets commented 1 year ago

@roperzh is this testable, or is the presence of messages being sent to that Slack channel sufficient for verification?

roperzh commented 1 year ago

@sabrinabuckets sorry for not including those. It's testable, I have included steps in the issue description

sabrinabuckets commented 1 year ago

Validated I see a 502 error in my server logs on a failed migration. Also noted that the shared Slack channel contains the error messages being sent to that customer's instance.

noahtalerman commented 1 year ago

From confirm and celebrate:

@zayhanlon heads up, this customer request was shipped in 4.37.

@roperzh do we have to do any extra configuration on the infra side to get the new errors flowing into the customer's shared Slack?

Also, Roberto, where should we document this feature?

roperzh commented 1 year ago

@noahtalerman what a great catch, thanks for the heads-up! we need to add a new environment variable I created https://github.com/fleetdm/confidential/issues/3633 for that.

Also, Roberto, where should we document this feature?

I'm thinking that probably in the contributors guide, right? it needs a server side config to work, but we haven't discussed if we want this as an "official feature"

noahtalerman commented 1 year ago

Nice! Thanks for opening that PR.

we haven't discussed if we want this as an "official feature"

@roperzh any reason to not make it official?

zhumo commented 1 year ago

Confirm and Celebrate: @roperzh @noahtalerman

Was this a one-off for the customer or is this generalizable/repeatable? If generalizable, then we should document it and make it re-usable for future customers. Questions that I have are:

  1. I want it to go to Slack, but then I need to set a key and channel name, right? how do I do that? My guess is that, it's really about setting up some separate server that regularly polls for new changes, right?
  2. What does that FLEET_ENABLE_POST_CLIENT_DEBUG_ERRORS env variable do exactly?
noahtalerman commented 1 year ago

This is repeatable, although it's not as easy as just setting a key and a channel.

My understanding is that setting FLEET_ENABLE_POST_CLIENT_DEBUG_ERRORS adds fleetd errors to the server logs. A self-managed has to configure their Fleet server to send their server logs to a Slack channel or their location of choice. This configuration happens in their hosting solution (AWS, Azure, GCP, etc.)

Looks like we need to document FLEET_ENABLE_POST_CLIENT_DEBUG_ERRORS.

@roperzh is that right? Also, can you please help document the new env var?

Looks like we might need docs for the configuring the Fleet server to send server logs to Slack or some other destination. Other than FLEET_ENABLE_POST_CLIENT_DEBUG_ERRORS, what else do I need to configure in Fleet to do this?

ireedy commented 1 year ago

C&C: @noahtalerman to follow up with Roberto.

noahtalerman commented 1 year ago

Looks like we might need docs for the configuring the Fleet server to send server logs to Slack or some other destination. Other than FLEET_ENABLE_POST_CLIENT_DEBUG_ERRORS, what else do I need to configure in Fleet to do this?

C&C: @roperzh ping!

roperzh commented 1 year ago

Looks like we might need docs for the configuring the Fleet server to send server logs to Slack or some other destination. Other than FLEET_ENABLE_POST_CLIENT_DEBUG_ERRORS, what else do I need to configure in Fleet to do this?

@noahtalerman nothing else from the fleet side. Next steps will depend on your particular set-up, we could document the combination we use (AWS + Slack) but we will need help from infra

Looks like we need to document FLEET_ENABLE_POST_CLIENT_DEBUG_ERRORS.

I tried to document this right now, but the problem is that it was introduced as an ad-hoc flag. All other flags I could find have a matching server configuration (in the config yaml and as a server flag)

noahtalerman commented 1 year ago

I tried to document this right now, but the problem is that it was introduced as an ad-hoc flag. All other flags I could find have a matching server configuration (in the config yaml and as a server flag)

@roperzh ah, ok. Maybe we just put this in the contributor docs then? So it's somewhere. Do we have contributor docs for server config?

noahtalerman commented 1 year ago

@roperzh we do have a contributor docs section for configuration: https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Configuration-for-contributors.md

Right now it just has configuration options for YAML documents but we should add a section for server config.

roperzh commented 1 year ago

@noahtalerman good idea! took a stab at this in https://github.com/fleetdm/fleet/pull/14656

noahtalerman commented 1 year ago

Thanks @roperzh!

fleet-release commented 1 year ago

Errors in workflow, Like leaves in a clear stream's path, Now seen, swift support.