fleetdm / fleet

Open-source platform for IT, security, and infrastructure teams. (Linux, macOS, Chrome, Windows, cloud, data center)
https://fleetdm.com
Other
2.9k stars 402 forks source link

Deploy security agents to macOS, Windows, and Linux hosts #14921

Closed noahtalerman closed 2 months ago

noahtalerman commented 9 months ago

Goal

User story
As an IT admin using the Software page, the Fleet API, or GitOps,
I want to add my security agents
so that I can deploy them to my macOS, Windows, and Linux hosts.

✅ Cross-platform app deployment

See a video walkthrough of the user journey in this Loom video.

Context

Changes

Product

Engineering

DB

Backend (general)

CLI

API

ℹ️  Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

Context

QA

Risk assessment

Manual testing steps

  1. Step 1
  2. Step 2
  3. Step 3

Testing notes

Confirmation

  1. [ ] Engineer (@____): Added comment to user story confirming succesful completion of QA.
  2. [ ] QA (@____): Added comment to user story confirming succesful completion of QA.
noahtalerman commented 9 months ago

Hey @marko-lisica I think we need a confirmation modal to describe to the user what happens when they delete a software item:

Screenshot 2023-11-25 at 11 49 48 AM

What happens? The package isn't removed/uninstalled from any hosts. It's no longer installed on hosts that enroll to Fleet.

mikermcneil commented 8 months ago

At a quick glance, this looks like a great start!

noahtalerman commented 8 months ago

Hey @marko-lisica! Leaving feedback here:

marko-lisica commented 8 months ago

At 1:15 in your 1st Loom recording, you talk about the API sending small chunks to the Fleet server. What does this mean for the user? Packages will actually take 5 seconds to upload? 10 seconds?

So basically I understood this way. If you have large file of 500 MB, and slow internet connection, it may take 20 minutes to upload that file. Even if timeout is set to 3 minutes it won't interrupt upload. Multipart form-data works in a way that it sends small chunks of that file each time. If that chunk upload takes more than 3 minutes then it will interrupt upload. I think we were spending a lot of time thinking about it, but it's not likely it will happen. That's how I understood @roperzh, could you confirm?

Looks like there's an improvement we can make to the output rendering (looks broken). Can you please file a separate feature request for this improvement and add it to the feature fest board?

Done ✅ - #15515

Around 4:40 in your 1st Loom, you mentioned that we can't get meta data from the .dmg. It looks like we might be able to? There's a # Install the .app step in your install-dmg.sh script. If we have the .app in this step maybe we can get the metadata we need?

The user will upload .dmg, so we can't get metadata after upload and do matching by name. If there's way to extract .dmg and get .app from it on the server, maybe that's how it could work.

At the beginning of your 2nd Loom, you mention that it's easy for the IT admin to get the .app from a .dmg manually. It looks like it might be easy for us to do this for the IT admin? If I'm understanding correctly, your install-dmg.sh script does this for the IT admin already (I could be missing something). If that's right, then maybe we just support .dmg and not .app so it's easier for IT admin?

As I mentioned in the previous answer, all of this happens on macOS, we need metadata after upload to do matching. IT admin could take .app from .dmg and upload to Fleet, which is very easy. The problem is with GitOps workflow, they won't be able to use URLs from vendors that (if they are .dmg files). It would be better if we could support dmg (to have easier GitOps workflow), but it depends if we can do this.

@roperzh Wdyt?

noahtalerman commented 8 months ago

If there's way to extract .dmg and get .app from it on the server, maybe that's how it could work.

@marko-lisica let's remember to bring this up w/ engineering folks on our next call.

noahtalerman commented 8 months ago

Hey @marko-lisica left some UI feedback for you in Loom here: https://www.loom.com/share/f52089d259d9403a9c7055f3cc55c1a4?sid=1536e398-e616-44ff-934c-b4d77591ce7d

Dropped API feedback in the PR: https://github.com/fleetdm/fleet/pull/15242/files

noahtalerman commented 8 months ago

Feedback from Mike:

noahtalerman commented 8 months ago

Hey @marko-lisica let's kicking this one to next sprint so we can focus on "upcoming activities" and the smaller stories this sprint.

We'll get to this one next design sprint.

noahtalerman commented 7 months ago

From design review doc:

DISCUSS: https://github.com/fleetdm/fleet/pull/15242#discussion_r1424133848 What about the download endpoint, it would be GET /software?alt=media (we decided to hide /software, would it conflict if we decide to get metadata of managed software?) DISCUSS Marko: Add teams filter for software version details view. API adjustments The only difference based on the selected team would be hosts_count,vulnerabilities are related only to the version Parameter description? DISCUSS Marko: Add new software title view to My device page. Needs a new API endpoint for My device page. Marko: Public or contributor API? Meeting w/ engineering: Bri: Vulnerable software filter on host details software table George: Filter for managed software on host details Moving icons to the left (status icons for managed software)

noahtalerman commented 7 months ago

Heads up @marko-lisica and @mikermcneil this request was discussed during feature fest last week and didn't make it into the current design sprint.

nonpunctual commented 6 months ago

https://fleetdm.slack.com/archives/C019WG4GH0A/p1707437681949749

noahtalerman commented 5 months ago

Hey @marko-lisica in a Loom video here, I chat about new learnings and updates since we were last working on this story.

Looking forward to chatting more tomorrow.

nonpunctual commented 5 months ago

Ubuntu = deb & snap packages https://ubuntu.com/about/packages

nonpunctual commented 5 months ago

@noahtalerman @marko-lisica Totally correct assessment that these "agent" type installs are harder.

But, some good news about UPDATING security agent packages is that MANY of them update themselves from the tenant once installed & this is usually preferred by client platform & security teams.

The reason is that client platform teams only have to be involved with the initial package roll-out. If auto-updates are then enabled from the tenant, the security team reclaims control of the version (which is usually how that ball bounces...)

Not universally true but definitely the trend I saw with:

Digital Guardian Tanium FireEye Symantec Systrack Delinea Privilege Manager Zscaler & others (blanking on names)

noahtalerman commented 5 months ago

client platform teams only have to be involved with the initial package roll-out.

@nonpunctual got it!

Sounds like our initial pass just needs to solve for initial deployment: script/profile gets delivered followed by the package.

nonpunctual commented 5 months ago

1 last shower thought comment on this: the one place usually where CPE teams need to worry about updates is in their provisioning workflows, ie, there can be drift between the version of the package that gets installed when a computer is provisioned & the version that the tenant is deploying (usually the most up-to-date.) If those drift too far apart, the package needs to be updated in the provisioning. Hope that makes sense.

noahtalerman commented 4 months ago

Hey @marko-lisica we're planning on addressing 2 more user stories within the software management "realm": updating/patching and self-service.

I think the solution for both of these use cases will include an interface (UI/API/CLI) to trigger software installation on a specific host at the requested time.

So, why not build that interface now? And, instead of automatically installing the software on every host on that team, leave it up to the IT admin to trigger the install. They can use failing policies webhook + Tines to automate this.

I think it will also let us move faster addressing this story.

Probably the biggest challenge: installing software can require a profile to be deployed first as a precondition, and a script to run next as a postcondition, with a retry strategy

I think we might be able to sidestep this challenge and take this on in a later iteration. This will help us move faster now and give us dedicated time solve the inevitable edge cases later.

Instead, we can block (return an error) if the IT admin hits the API to install a security agent on a host that doesn't already have the profile installed. This is a problem with an easy to understand solution: make sure the profile is installed before you try again.

We need to track the state not just of individual steps (the "before" profile, the installer itself, the "after" script) but of the sequence as a whole, in order to show the status of the software on the host (and filter on failed orchestrations in "List hosts", etc.).

We might not need to do more work to track the profile. We already have it's status (pending, failed, verifying, verified).

If installing security agents is indeed idempotent (you can install over a successful/failed install) then we might not need to track whether the install or the script was unsuccessful. Instead, we can track "was the installation "as a whole" (install + script) successful.

An unsuccessful install is another problem with an easy to understand solution: try again.

And, if we learn that security agents aren't idempotent, I assume there are scripts to remove/uninstall failed or successful installs. So, there's an extra step to solving the unsuccessful install, run a script.

noahtalerman commented 4 months ago

Hey @pintomi1989 heads up, we didn't get this one estimated in the last design sprint.

Plan is to bring it into the next design sprint (4.49).

Bringing this to feature fest.

noahtalerman commented 4 months ago

Hey @marko-lisica, I left some feedback on the UI/CLI changes in a Loom here: https://www.loom.com/share/867428f755b64bd9ac620fe197a3a19c?sid=7b2eb119-dfaa-4e04-8876-5934b7545748

noahtalerman commented 4 months ago

From product office hours:

Brock:

Basically that Jamf now has a new & improved cloud distribution point & we should look to its features to make something similar. the Der Flounder article is about customizing the new JDCS with solutions like munki & another article I think I posted way back from a Jamf PS guy discussed BYPASSING a Jamf DP for packages.

In other words, I think the Fleet MVP for packages could just be: give the Host a secure, encrypted link to a URL where packages are available. That's it. Not rebuilding munki or any other system, just get a Fleet Distribution point up & running & as long as it has a cert & URL, move on. :)

noahtalerman commented 4 months ago
  • Filesystem is default storage. No new config to opt in to filesystem storage
  • S3 storage optional. Same config as filecarves (docs here)

Hey @mna, this is the plan for package storage location. I updated the issue description.

More context including the "why" here in this Google doc.

What do you think?

cc @lukeheath @georgekarrv @rfairburn

lukeheath commented 4 months ago

@noahtalerman My primary concern is that we are very clear that the filesystem is the default storage and encourage users to configure S3 in production. Otherwise, they may fill the server's file system memory and crash Fleet.

noahtalerman commented 4 months ago

@lukeheath, I think we document S3 as required.

Filesystem is there for trials (fleetctl preview) and dev environments.

mna commented 4 months ago

@noahtalerman

Hey @mna, this is the plan for package storage location. I updated the issue description.

More context including the "why" here in this Google doc.

What do you think?

Sounds good to me, looks like we already piggy-back on the S3 config to store fleet-osquery installers (in addition to carves, but the use-case here is more similar to the fleet-oquery installer). We'll want to implement a similar abstraction (a common interface) and have it implemented for both local filesystem and S3 config, so that we don't have to worry about which is used internally.

rachaelshaw commented 3 months ago

Something that came up during a design review today: the designs don't make it clear that the install script command will be different depending on OS, and whether this should be pre-filled. @lukeheath is going to discuss with @georgekarrv and make a final decision about what to do for this UI. Screenshot 2024-05-03 at 11 48 21 AM

georgekarrv commented 3 months ago

Since pkg is only for macOS,exe and msi is only windows and deb is only Debian Linux there isn't an issue here currently

georgekarrv commented 3 months ago

Discussed with Luke that one point to make it more obvious that it's edit-able will be changing the text input to one that matches closer to the query input below (lined numbers w/ wrap etc) and changing the word command in the underlying text to script and autofilling it with

#!/bin/sh
installer -pkg ./FalconSensor-6.44.pkg -target /

The newline and she-bang will make it more obvious w/o any other words that this is a script, you can add more or less and the line numbers will make it more clear that you can edit it.

noahtalerman commented 3 months ago

We decided to cut the ability to specify install_script, pre_install_query, and post_install_script inline: Screenshot 2024-05-06 at 10 43 45 AM

Why?

We also decided to add a path sub-key to install_script. Like we’re adding for pre_install_query and post_install_script.

This way, the interface is consistent and we’re setting ourselves up for specifying these inline later.

The CLI wireframes in Figma are updated to reflect this.

cc @roperzh

noahtalerman commented 3 months ago

Summary of the discussion and decisions during today's MDM standup

Blockers

The follow are no longer blockers. I removed them from the issue description.

  • [ ] TODO: Determine the three commands. Maybe 4 because different flavors of unix. Understand their interfdace, how they are called. Document this in the wireframes.
  • [ ] TODO: Luke will work with team to sort out what to do about the read only vs editable aspects, erring on the side of a 2-way door if possible (not a 1-way door that's hard to migrate out of). In other words, make it so that you can edit less-- you can always make it more customizable, but hard to go the other direction. If necessary/appropriate/soundest, cut the ability to configure even CLI opts.

Sort on Host details (My device) > Software table

@ghernandez345 and @mna we'd like to update the default sort on the Software table on the Host details and My device pages to name ascending.

Name column to be sortable. In addition, we can cut the special sort for the Install status column. Figma is updated:

Screenshot 2024-05-06 at 12 48 52 PM

Why? The Fleet UI convention is to have a default sort that is always visible on the page. This is feedback from Mike that product design forgot to relay to y'all. Sorry about the last minute heads up.

Add software modal

@ghernandez345 we decided to use the ace editor and update the help text for the Install script input field. Figma is updated:

Screenshot 2024-05-06 at 12 52 28 PM

On the Advanced options modal (for uploaded software) let's remove for the Install script input field:

Screenshot 2024-05-06 at 12 53 04 PM

Heads up @roperzh, we want to add the #!/bin/sh shebang to the default pkg script so that it's obvious that this is a shell script.

FYI @lukeheath, @ghernandez345 we decided that Gabe will be taking on these UI changes.

Disabling deploy security agents if scripts are disable

We decided to allow deploying software. We can follow up in a later pass to add an off switch for deploying software.

@roperzh just checking, if a host has the fleetd agent w/ scripts disabled, will that prevent the ability to deploy software (and run an arbitrary script) on a host?

Providing a custom install script

We decided to go w/ the environment variable approach. When customizing the install script, the IT admin will specify the software's location as an environment variable in their script. Fleet will populate this variable for them.

installer -pkg "$INSTALLER_PATH" -target /

cc @georgekarrv

lukeheath commented 3 months ago

@noahtalerman Thanks; I agree UI changes should go to @ghernandez345.

roperzh commented 3 months ago

@roperzh just checking, if a host has the fleetd agent w/ scripts disabled, will that prevent the ability to deploy software (and run an arbitrary script) on a host?

@noahtalerman sorry, just catching up on that mention. Currently it won't prevent the ability to deploy software per what we discussed.

noahtalerman commented 3 months ago
  • Software can only be installed on a host that has a fleetd agent with scripts enabled

Hey @dantecatalfamo and @georgekarrv, I added the above requirement to the issue description to document what we decided during standup today.

Also, in Figma here, I wireframed my understanding of what the IT admin sees when they try to install software on a host that has fleetd w/ scripts disabled:

Screenshot 2024-05-07 at 12 28 06 PM

Please let me know if this wireframe doesn't align w/ y'alls' understanding.

noahtalerman commented 3 months ago

Roberto:

We're missing an activity item for software installers added via the CLI Generally those are worded differently and use the word "edit" because they're set in batches. Eg, for scripts: https://fleetdm.com/docs/using-fleet/audit-logs#edited-script

Noah:

Hey! Thanks for calling this out.

I think it’s ok and even good that we don’t have an activity feed item for this. Less to maintain.

Why? The best practice is to use Fleet’s best practice GitOps.

Using this action will generate a “changes were made via GitOps. See you latest commit” message in the activity feed.

So as long as adding software via GitOps triggers this message we’re golden.

fleetctl apply, and the already existing activities, are supported for backwards compatibility GitOps.

PezHub commented 3 months ago

I was able to complete the installation workflows with Falcon Sensor for macOS and Windows successfully.

macOS Success

Windows Success

We should consider populating the default script with the syntax I ended up using for the .exe

$exeFilePath = "${env:INSTALLER_PATH}"

$installProcess = Start-Process $exeFilePath `
  -ArgumentList "/install /quiet /norestart CID=<-license->" `
  -PassThru -Verb RunAs -Wait

exit $installProcess.ExitCode
mna commented 3 months ago

We should consider populating the default script with the syntax I ended up using for the .exe

The problem is that this is specific to this application - most (at least some/many) .exes are not installers and are the application itself, in which case the "install script" is just copying the .exe to a good destination folder on the host. Maybe we could special case some well-known applications to have different default scripts regardless of extension, but in any case the user would need to edit it for the license (unless passing up a license or some sort of secret is so common that we provide a separate field to enter it and use an env var in the script to pass it?). Anyway, very nice that the whole flow worked for this, and good food for thought for future improvements!

PezHub commented 2 months ago
valentinpezon-primo commented 2 months ago

Hi - We have an error "failed to parse multipart form" uploading the Notion pacakge, it's seems we have a timeout issue bcs file is too big :/

You can download notion.exe here : https://www.notion.so/desktop

Screenshot 2024-05-30 at 08 50 01

Screenshot 2024-05-30 at 08 49 35

noahtalerman commented 2 months ago

@pintomi1989 heads up, this customer request was shipped in Fleet 4.50.

TODOs from C&C:

marko-lisica commented 2 months ago

@noahtalerman Here's the PR to add permissions to manage access page. https://github.com/fleetdm/fleet/pull/19405

fleet-release commented 2 months ago

Deploy agents swift, On macOS, Linux, Windows, Security uplift.

nonpunctual commented 1 month ago

Install_SentinelOne_Agent_Linux.sh

#!/bin/sh

# Define log file path
logFile="${TMPDIR:-/tmp}/fleet-install-software.log"

# Function to log messages
log_message() {
    message="$1"
    timestamp
    timestamp=$(date +"%Y-%m-%d %H:%M:%S")
    echo "$timestamp - $message" | tee -a "$logFile"
}

# Ensure that INSTALLER_PATH is set
if [ -z "$INSTALLER_PATH" ]
then
    log_message "Error: INSTALLER_PATH is not set."
    tail -n 500 "$logFile"
    exit 1
fi

# Ensure that the installer file exists
if [ ! -f "$INSTALLER_PATH" ]
then
    log_message "Error: Installer file not found at $INSTALLER_PATH."
    tail -n 500 "$logFile"
    exit 1
fi

# Define the Site Token
SITE_TOKEN="INSERT_SITE_TOKEN"

# Install the SentinelOne agent
log_message "Installing SentinelOne agent from $INSTALLER_PATH..."
apt-get install --assume-yes -f "$INSTALLER_PATH"
if [ "$?" -ne 0 ]
then
    log_message "Error: Failed to install SentinelOne agent."
    tail -n 500 "$logFile"
    exit 1
fi

# Link the agent with the provided Site Token
log_message "Linking SentinelOne agent with the Site Token..."
/opt/sentinelone/bin/sentinelctl management token set $SITE_TOKEN
if [ "$?" -ne 0 ]
then
    log_message "Error: Failed to link SentinelOne agent with the Site Token."
    tail -n 500 "$logFile"
    exit 1
fi

# Start the SentinelOne agent
log_message "Starting SentinelOne agent..."
/opt/sentinelone/bin/sentinelctl control start
if [ "$?" -ne 0 ]
then
    log_message "Error: Failed to start SentinelOne agent."
    tail -n 500 "$logFile"
    exit 1
fi

log_message "SentinelOne agent installed, linked, and started successfully."
tail -n 500 "$logFile"
exit 0

Install_SentinelOne_Agent_Windows.ps1

# Define log file path
$logFile = "${env:TEMP}/sentinelone-install.log"

# Function to log messages
function Log-Message {
    param (
        [string]$message
    )
    $timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
    $logEntry = "$timestamp - $message"
    Write-Output $logEntry | Out-File -Append -FilePath $logFile
}

try {
    # Ensure that INSTALLER_PATH is set
    if (-not $env:INSTALLER_PATH) {
        throw "Error: INSTALLER_PATH environment variable is not set."
    }

    # Ensure that the installer file exists
    if (-not (Test-Path -Path $env:INSTALLER_PATH)) {
        throw "Error: Installer file not found at $env:INSTALLER_PATH."
    }

    # Install SentinelOne agent with the site token
    Log-Message "Installing SentinelOne agent from $env:INSTALLER_PATH with Site Token..."
    $installProcess = Start-Process msiexec.exe `
        -ArgumentList "/i `"$env:INSTALLER_PATH`" /qn site_token=`"INSERT_TOKEN_HERE`" /norestart /lv `"$logFile`"" `
        -Wait -PassThru -Verb RunAs

    # Check if the installation was successful
    if ($installProcess.ExitCode -ne 0 -and $installProcess.ExitCode -ne 3010) {
        throw "Error: Failed to install SentinelOne agent. Exit code: $($installProcess.ExitCode)"
    }

    # Check if a reboot is required
    if ($installProcess.ExitCode -eq 3010) {
        Log-Message "Installation completed successfully, but a reboot is required. Rebooting now..."
        Restart-Computer -Force
    }

    Log-Message "SentinelOne agent installed successfully."
    Get-Content $logFile -Tail 10  # Display last 10 lines of the log file

    exit 0  # Exit with success
}
catch {
    # Catch any errors and log them
    $errorMessage = $_.Exception.Message
    Log-Message "Error: $errorMessage"
    Get-Content $logFile -Tail 10  # Display last 10 lines of the log file

    exit 1  # Exit with error
}

Install_Tenable_Nessus_Agent_Linux.sh

#!/bin/sh

# Install the package specified by $INSTALLER_PATH using apt-get
apt-get install --assume-yes -f "$INSTALLER_PATH"

# Check the status of the Nessus Agent
/opt/nessus_agent/sbin/nessuscli agent status

# Start the Nessus Agent service
/sbin/service nessusagent start

# Enable the Nessus Agent service to start on boot
systemctl enable nessusagent

# Link the Nessus Agent to the Flock Safety Tenable instance
/opt/nessus_agent/sbin/nessuscli agent link --key="INSERT_KEY" --groups="INSERT_GROUP_NAME" --host="INSERT_HOST"

# Check the status of the Nessus Agent after linking
/opt/nessus_agent/sbin/nessuscli agent status

Install_Tenable_Nessus_Agent_Windows.ps1

# Define the path for the log file, using the TEMP environment variable
$logFile = "${env:TEMP}/fleet-install-software.log"

# Define arguments for Nessus groups, server, and key
$nessusGroups = "INSERT_GROUP"
$nessusServer = "INSERT_SERVER"
$nessusKey = "INSERT_KEY"

# Start the installation process using msiexec with specified arguments
# -ArgumentList specifies the command-line arguments for msiexec
# /quiet and /norestart make the installation silent and prevent automatic restart
# /lv specifies the log file path
# /i specifies the installer path
# NESSUS_GROUPS, NESSUS_SERVER, and NESSUS_KEY are custom properties passed to the installer
# /qn ensures the installer runs without any user interface
# -PassThru allows the cmdlet to return a process object
# -Wait waits for the process to complete before continuing
$installProcess = Start-Process msiexec.exe `
  -ArgumentList "/quiet /norestart /lv ${logFile} /i "${env:INSTALLER_PATH}" NESSUS_GROUPS="$nessusGroups" NESSUS_SERVER="$nessusServer" NESSUS_KEY="$nessusKey" /qn" -PassThru -Wait

# Output the last 500 lines of the log file to the console
# This helps in reviewing the installation process and any errors that might have occurred
Get-Content $logFile -Tail 500

# Exit the script with the exit code from the msiexec process
# This ensures that the calling process or script can determine if the installation was successful
exit $installProcess.ExitCode