Closed blakerouse closed 4 years ago
Pinging @elastic/ingest-management (Team:Ingest Management)
@urso @ruflin @roncohen @mostlyjason I want to have your opinion on this, we did discuss this strategy a long time ago and I must confess I was not OK with it. But now, I believe its the right thing to simplify the use case by limiting our options.
Adding @exekias @andresrc too.
@EricDavisX @mdelapenya FYI, this will have an impact in how we are doing the e2e testing in the context of upgrade / upgrades.
Thanks PH. @blakerouse can you coordinate with me and @mdelapenya and @matthagenbuch the specifics of what we'll want to know so we can update both of the siem-team and the e2e-testing repos wrt Agent deploy. I'd like to confirm what I understand, please correct whats wrong and help fill in:
If indeed Agents can still be started manually (and not as a service), then we can retain the Kibana Endpoint Demo environment Agent deploy code for Linux, but it sounds like we'll need to remove and modify some code for Windows since it makes use of the powershell ps1 installer. these 3 lines can be... combined into 1 non-interactive shell command I guess?
Enroll the Agent , Install Agent service , Start the Agent service
For the e2e tests... we already have coverage for .deb and .rpm installs so we'd like your help to stage the changes to update them before this feature work is merged. We'll help guide the team thru it. ;) I won't post here more details except to say we have not implemented tests there for the upgrade yet.
I think it will be very important to provide DEB/RPM packages for the adoption of Elastic Agent. I don't know much about MSI, but I imagine it could be similarly important on Windows.
I understand we want to make it possible to auto-update as many installations as possible, but I don't think it justifies removing the DEB/RPM options. Could we instead treat DEB/RPM same as the tar ball and mark the Agents installed through DEB/RPM as non-auto-upgradable?
Agreed. From a sysadmin perspective, it's really important having a package (RPM, DEB, MSI...), as you are able to use simply the OS to install and upgrade the packages using OS-native tools (yum, apt, snap, choco...), being the case they are published to the upstream repositories.
I understand we want to make it possible to auto-update as many installations as possible, but I don't think it justifies removing the DEB/RPM options. Could we instead treat DEB/RPM same as the tar ball and mark the Agents installed through DEB/RPM as non-auto-upgradable?
+1
I don't know much about MSI, but I imagine it could be similarly important on Windows.
+1, but I guess self-upgrade is more accepted in Windows-land (at least chrome/firefox users a used to it ;) )
Have we considered a 'shim' DEB/RPM package that fetches the agent on first run (or if not present), such that the agent can update itself, but the package can be provided via repositories? The shim would connect to Kibana and ask for the version in order to figure the agent version to download (this can either be a manual or automatic process).
I understand we want to make it possible to auto-update as many installations as possible, but I don't think it justifies removing the DEB/RPM options. Could we instead treat DEB/RPM same as the tar ball and mark the Agents installed through DEB/RPM as non-auto-upgradable?
@roncohen We could do that, we would need the same story for the docker release, I will organize something few a few in this issue.
I've changed my mind, meetings are hard, let's try to do it async.
@roncohen @ruflin @blakerouse You have pretty good points for the adoptions of the elastic agent, would you be OK with the following phases approach.
@blakerouse @michalpristas Do you foresee any problems if we do it that way?
Do more research on the RPM/Debian with the following:
Option A: Either keep RPM, DEBIAN non-upgradable. Option B: Use the shims-like strategy. Option C: Allow Elastic-Agent to talk to the package manager.
The above strategy would allow us to keep adding values and give us a bit more time for the debian/rpm strategy.
Concerning MSI, PKG.
I believe they can come later, @EricDavisX can confirm, I think endpoint never had an MSI installer. Usually on Windows and macOS theses tools installation will be scripted using jamf or other provisioning tools. So providing an MSI or PKG will not add a lot of values and I am not sure it super needed at least in the short term. We could always wrap the install subcommand in an MSI or the macOS equivalent.
Note that we would always need some parameters for the installation: enrollment keys, possibly certs, host. So we would still have a need to add parameters to the installer or providing a file next to the installer.
'I believe they can come later, @EricDavisX can confirm, I think endpoint never had an MSI installer. Usually on Windows and macOS theses tools installation will be scripted using jamf or other provisioning tools.'
This is correct. To elaborate, the Endgame Sensor has a 'in-band' installer and supports out-of-band as well (via JAMF or SCCM on Windows, etc). The in-band was difficult to create, maintain and little used in the field when thousands of hosts would be deployed at a time.
As PH notes we'll need to point to files and the Kibana URL and enrollment token, our Kibana Fleet UI that shows the 'install string' is still the key to getting those values / data to the install execution, do we need / want to enhance what we expose and support in that UI to keep the command-line usage simpler (as a calculated trade off).
I do not believe that we should offer 2 installation methods. If we want to keep the DEB/RPM/MSI then we should not add the install/uninstall sub-commands. Having 2 installation methods means we have to support both at the same time, that is an un-needed burden for little benefit.
I would like to provide some context on DEB/RPM front. I think the idea that without a DEB/RPM Elastic Agent adoption will be affected is not technically true. The reason is that Elastic Agent will not be in the upstream repositories, and you do not want it to be.
I will provide my experience for Ubuntu archive as an example as why you don't want to be in the archive and why a PPA is not a substitute for not being in the archive. When a package is added to the archive (in Elastic Agent case, it would be in universe) it is fixed to the version when Feature Freeze is hit for the archive. So imagine that we released 7.8 with Ubuntu 20.04 LTS, it would always be 7.8.x from the archive. It would not get approval to be upgraded to 7.9 in the archive, so users on Ubuntu 20.04 that did apt install elastic-agent
would always get 7.8 for 5 years (which Elastic would then need to provide 7.8.x for 5 years). Note: This is why Canonical started making snapd, and why all its core products now ship as snaps.
The substitute for being in the archive is provide a PPA. This is not as straightforward to get started with from a user standpoint. They still need to add the PPA, run apt update
and then run apt install
. So it's a 3 step process to get elastic-agent
. Yes it is true that adding PPA's is something that users in the Ubuntu world are used to, its actually very dangerous to do from security standpoint. A PPA can provide any package to a user no matter the name of the package, so that means that a PPA could easily add a modified version of libc
that overrides the libc
version from the archive. That overridden libc
could do anything at that point and have complete control of the users system. I have been an Ubuntu user for 8+ years and the last thing I do is install a PPA on my system.
This is not my argument against providing DEB/RPM, I just think it's not something that is so fundamental that we have to have it. If we feel that it's a must to have DEB/RPM then we should ensure that Elastic Agent knows how to interact with the package manager and upgrade itself using DEB/RPM's. Being that Elastic Agent runs a security product Elastic Endpoint Security, we want to always be at the latest version, so we should not prevent self-upgrading just because we where installed by a DEB/RPM. If Elastic Agent is being ran by a service manager then it should be self-upgradable, no matter how it got installed.
A PPA can provide any package to a user no matter the name of the package, so that means that a PPA could easily add a modified version of libc that overrides the libc version from the archive. That overridden libc could do anything at that point and have complete control of the users system. I have been an Ubuntu user for 8+ years and the last thing I do is install a PPA on my system.
I haven't considered that security case at all.
Here is some background on the Elastic Endgame product's approach to this issue.
The main reason there is no traditional installer (RPM/DEB/PKG/MSI) for the Phase 0/Elastic Endgame product is that Elastic Endgame supports "signature diversity", which means customers can modify some details like the paths and process names Elastic Endgame Sensor installs as. That customization requires custom installer.
With that said, customers did ask for traditional installers. Customer Success would work with them to wrap the Elastic Endgame Sensor's installer in an "after market" RPM/DEB/PKG/MSI that just contained sensor-installer.exe
and ran sensor-installer.exe install
(effectively) when it was installed. I'm not sure Endgame ever lost any sales because there isn't a traditional installer, but I know it was a minor annoyance for some customers.
@ferullo Concerning the wrapped part, this mean that the package would be stuck to a version but the endpoint itself could diverge?
Yes, I believe so. That is/was totally unsupported so I never had any direct insight into how it would work for customers.
@tyoungs-estc might have insight into how doing that works for Elastic Endgame customers.
+1 to keep packages, marked as non-upgradeable. We need to support this in any case for docker images, and I'd say that also for tar.gz/zip packages.
One point that hasn't appeared yet, and I think is important, is that for most system software, many people want to have the versions installed under control, and decide when to upgrade, because it may imply needing to dedicate time to unexpected issues appeared during the upgrade. For example I would like to control when my monitoring software upgrades, so I don't start to have anomalies in my logs or metrics collection during a different operation, or during an incident. For these cases, people uses to rely on package managers.
Maybe the greatest exception to this is the security software. You probably want a security sensor to be as updated as possible no matter what so it detects the most recent known issues.
Here I think that we are covering both use cases, monitoring software whose versions you may want to control (Metricbeat, Filebeat) and security software where you probably want the latest version (Endgame Sensor). So it seems to me that we are going to need to cover both cases.
If we elect to keep packages they should be upgradable. Not supporting upgrading just because the installation came from a package is the wrong approach. Self-upgrading on Mac OSX using a PKG is fine, self-upgrading on Windows using a MSI is also fine. The issue is only due to the nature of packaging on Linux based systems.
The self-upgrading of Elastic Agent is only possible when the agent is enrolled in Fleet, otherwise it will stick to the installed version. This means that upgrade of Elastic Agent is controlled. Self-upgrading is not something that just happens when a new release comes out. First the stack must be upgraded to the latest version, then inside of Fleet they will be presented with the option to start the upgrade of the agents. This is the power of Fleet and not doing self-upgrade because Elastic Agent was installed from a package (when most on this issue suggest this will be the main installation path) is missing a large gap.
I think the one place that self-upgrading be disabled is in the case of docker image. I still feel like the docker image case could also be solved. Exposing the docker socket to Agent or Agent being an operator on Kubernetes could allow itself to be upgraded as well.
@blakerouse thanks for your comment, this clarifies some things, and tilts the balance towards the removal of packaging for me. The upgrading process for Agent with Fleet was not clear to me.
Self-upgrading on Mac OSX using a PKG is fine, self-upgrading on Windows using a MSI is also fine. The issue is only due to the nature of packaging on Linux based systems.
How are linux packages different to PKG/MSI regarding self-upgrading? Are PKG/MSIs installed without tracking the version in the system, as if they were tars/zips?
How do you think installation should work for standalone mode? Would it make sense to use packages there? Or the elastic-agent install
command can be used to upgrade to a newer version?
@blakerouse thanks for your comment, this clarifies some things, and tilts the balance towards the removal of packaging for me. The upgrading process for Agent with Fleet was not clear to me.
Self-upgrading on Mac OSX using a PKG is fine, self-upgrading on Windows using a MSI is also fine. The issue is only due to the nature of packaging on Linux based systems.
How are linux packages different to PKG/MSI regarding self-upgrading? Are PKG/MSIs installed without tracking the version in the system, as if they were tars/zips?
I am no expert on Mac or Windows, but normally the version is not something stored anywhere on the system for those installers. Windows might store some registry keys, but most of that is not shown to the user.
How do you think installation should work for standalone mode? Would it make sense to use packages there? Or the
elastic-agent install
command can be used to upgrade to a newer version?
Yes elastic-agent install
would notice that Elastic Agent is already installed and instruct it to upgrade. See the "Installation Upgrade" in the issue description for how that would flow.
Jumping in here a bit late. Concerning:
I believe they can come later, @EricDavisX can confirm, I think endpoint never had an MSI installer. Usually on Windows and >macOS theses tools installation will be scripted using jamf or other provisioning tools. So providing an MSI or PKG will not >add a lot of values and I am not sure it super needed at least in the short term. We could always wrap the install >subcommand in an MSI or the macOS equivalent.
There are a number of reasons for this. @ferullo covered the signature diversity piece. In addition, our users tended to be a bit different. Traditional endgame users were relatively mature on the security/operations scale. Our users typically had sophisticated deployment systems and had the white glove treatment from our support team during implementation. Even with this model - PKGs/DMGs/MSIs were frequent requests. Having an MSI/PKG is valuable for our users and use case.
In my experience not having DEB/RPM (and mostly RPM in RHEL shops) can make conversations difficult in certain kind of customers, though the shim could be a good option there (we would need to look for a good approach for versioning, to avoid confusions, possibly with a shim per major).
In any case users should be able to disable upgrades, +1 to @jsoriano comments on users wanting to avoid unknown changes
One concern I have with DEB/RPM specifically is the ability for a customer to inadvertently upgrade their Agents to a newer version than the stack. This may be a problem already with beats, so maybe it doesn't matter. But, I think the Agent does strict version checking and will not run if the stack is newer? Do Beats do this? A user could end up in a situation where a yum upgrade
or an apt-get upgrade
could leave them with a non-functioning install?
@crowens That is already the issue today where a DEB/RPM upgrade could break the current beats. Agent will have the same issue when installing from DEB/RPM. When using the install/uninstall
Agent will ensure that an upgrade does not break it.
After discussions it was agreed that:
install/uninstall
will result in an Agent that can be upgraded from Fleetinstall/uninstall
, this will not replace the non-upgrading DEB/RPMI have updated the description and title of this issue to remove the mention of deleting of packaging and to cover that DEB/RPM installs will not be upgradable from Fleet.
Hi @EricDavisX
We have validated this ticket on 8.0.0-SNAPSHOT kibana cloud environment with 8.0.0-SNAPSHOT elastic-agent artifacts from google cloud link&prefix=&forceOnObjectsSortingFiltering=false).
We have failed 09 and blocked 03 testcases(due to below query) under [Elastic Agent] Install/uninstall subcommand TestRun and reported 02 bugs: https://github.com/elastic/beats/issues/21244 https://github.com/elastic/beats/issues/21247
Moreover, we are facing #78024 bug on 8.0.0-SNAPSHOT Kibana cloud environment.
Query: Currently on installing agent in non interactive mode(as per 'Fresh Installation Non-Interactive' header in description), agent is only installed as service on the OS(say windows, macOS) and not enrolled into fleet. Could you please confirm if it is the expected behavior.
Windows:
Mac:
Please let us know if anything is missing from our end.
Here is my update, a few notes, questions, issues we're seeing in initial testing. We're getting up the hill on the learning curve for the new usage, thanks for the help everyone.
About the above command Rahul uses, it is trying to use the uuid of the token, not the actual token secret itself, so its actually a negative test that is passing (we don't expect this to work). :)
elastic-agent.exe install -f --kibana-url https://8040186313ec4a228ecc22ebee9ebcd3.us-central1.gcp.foundit.no:9243 --enrollment-token YW9mUm9YUUJWMUZuVTV1TlJNeTQ6WjBTQ3R6VFZUb0toeTBEQ05CcGVcba==
1) the install command seems not to be working on Windows 7 x 64 as tested and Blake thinks he sees a problem and is fixing. We can re-package and continue tests. We focused on the -f non-interactive as it was seen working in some environments.
2) The install command on windows (at least) has some redundant 'Agent is in Beta' statements given out, and posts some status and contradict itself - it says Agent is running and then says 'Agent might not be running' - I think we will want to tighten that up as possible, and it may only be showing strangely when its an error condition, tho thats when its needed most!
3) darwin / macOS 'install' with interactive mode, it did not fully install the Agent correctly. This may be the same problem as noted, maybe not?
4) darwin / macOS with non-interactive seems to have installed correctly (as seen in the Agent Activity details), but when re-booting, the Agent was running but the Metricbeat / Filebeat were not.
5) while it was documented in this ticket above, both Rahul and I made the mistake of trying to use the un-install command incorrectly. we validated there is some error message for it. the un-install usage, for safety's sake, requires that can't use the same binary you just used to install the Agent... you have to use the binary that was placed into a specific file location, at: C:\Program Files\Elastic\Agent' so 'cd' to that location then you can run 'elastic-agent.exe uninstall' or on linux / mac: elastic-agent uninstall
we'll need to call this out clearly in our on-line docs but anything we can add to the Agent cli output to inform users (after a successful install) about how they can un-install or otherwise interact with Agent it may be helpful.
7) the uninstall command (as tested on Windows 7 x 64) didn't seem to remove all of the directories we expected. After uninstall:
C:\Program Files\Elastic
should remain, but the 'Agent' subdirectory and contents should be removed.
We can re-test, perhaps system was in a bad state. note the linux / mac file locations above, thanks for doc'ing that Blake!
8) the uninstall command as tested on Win 7, seems to have kept the service installed and running as seen in 'services' program. We can re-test, perhaps system was in a bad state, unless it seems to be a separate problem?
Lastly, we have some test content now to start off testing, but we need to expand it to include more. @rahulgupta-qasource can you continue the edits I started please? we need to add:
Also, we're potentially running into this which has been open for about 9 days which is making tests harder to confirm and execute: https://github.com/elastic/beats/issues/21120
Hi Eric,
Thank you for sharing the feedback.
We have created 11 testcases under Fleet ->install and uninstall command section in TestRail and executed 11 under [Elastic Agent] Install/uninstall subcommand TestRun.
Observations: For Windows7 32bit we tested again to install and uninstall with interactive and non-interactive method:
It does not installs successfully. We get an error if we start the service for agent.
Interactive uninstall : Observation : On uninstalling the agent we get a success. Also, the elastic-agent stops in the services.
Non-Interactive uninstall: Observation : Uninstalling the agent is successful
Query: On successful interactive and non-interactive uninstall, agent still remains in the enrolling state on UI under fleet tab.
As per our understanding, it should be uninstalled from fleet tab. Could you please confirm if it is the expected behavior.
As there are some issues in install/uninstall, we will run more tests once blake pr is merged.
@rahulgupta-qasource I think you need to start with a fresh system or the package you are using is the wrong build. The name of the service using ./elastic-agent install
is always Elastic Agent
on Windows, in your screenshots it is elastic-agent
.
Hi @blakerouse / @EricDavisX
Thank you for sharing the feedback.
We have executed the failed and pending testcases(Total: 21 testcases) on 8.0.0-SNAPSHOT kibana cloud environment under [Elastic Agent] Install/uninstall subcommand TestRun with latest 8.0.0-SNAPSHOT elastic-agent artifacts from google cloud link&prefix=&forceOnObjectsSortingFiltering=false).
Query: For standalone agent mode, we followed the steps in testcase https://elastic.testrail.io/index.php?/cases/view/33968 and found that flow after triggering the command "./elastic-agent install" is same as for Fleet interactive mode.
Moreover, standalone agent is enrolled into Fleet on providing 'Y' value to 'Do you want to enroll this Agent into Fleet? [Y/n]' and filebeat error is displayed in activity logs.
Screenshot:
PowerShell run as admin with elastic-agent install command:-
Activity Logs:
elastic-agent.yml file for standalone mode:- elastic-agent.zip
As per our understanding, for standalone mode we have to provide 'n' value to 'Do you want to enroll this Agent into Fleet? [Y/n]' . Could you please confirm the steps for standalone mode and expected behavior.
Please let us know if anything is missing from our end.
Hi Eric
We have passed 10 elastic agent install tests on win x86, win x64, mac .tar.gz, linux .tar.gz, linux .rpm under Elastic agent install subcommand TestRun with latest 8.0.0-SNAPSHOT elastic agent artifacts from observability gcp link .
Screenshot for Linux agent enrolled with .tar.gz :-
Please let us know if anything is missing from our end.
@rahulgupta-qasource lets continue testing this, I have a few more areas we can validate and some to document for repeat tests. You'll likely need to continue to use the GCP bucket Agent with an older working Kibana (I have one I can hopefully post to you), but do what you can to find a working combination.
For the tests you run, please include the following:
1) add in Endpoint to the policy and ensure Endpoint is loaded correctly on Win x64, darwin, and linux .deb
1.1) and validate that you can see an Alert on mac + windows.
1.2) And please do the re-boot test of the host to ensure it comes back up (Endpoint too). There was a bug for Endpoint not coming up correctly, it would be interesting to see if we can see the failure in our usage: https://github.com/elastic/beats/issues/21424
2) test stand-alone Agent with x64, and .deb and darwin - using the 'install' command and use the dialog to cite 'not' to enroll in fleet when asked. :) then carry on and do the re-boot tests to make sure it comes back up, no need for endpoint in the config for this test (its not supported). and we need test cases for this too, i think. thank you!
Hi Eric
Thank you for sharing the feedback.
We have created 03, updated 06 testcases and reported following 02 bugs for above tests: #21445 #21449
Moreover, we have failed 06 and blocked 04 testcases under Elastic agent install with Endpoint TestRun(with latest 8.0.0-SNAPSHOT elastic agent artifacts from observability gcp link) due to above 02 bugs.
We will retest the tests once the above 02 bugs are fixed.
Please let us know if anything is missing from our end.
Hi Eric
We have created 03 and updated 03 testcases and reported following 01 bug #21512 for above tests:
Moreover, we have executed 12 tests(failed 04 and blocked 02 testcases due to bug #21512) under Elastic agent install with Endpoint TestRun(with latest 7.10.0-SNAPSHOT elastic agent artifacts from artifacts link).
Please let us know if anything is missing from our end.
This issue doesn't have a Team:<team>
label.
Hi Eric,
We have created and executed 08 install persistent reboot tests under Install persistent reboot tests(https://elastic.testrail.io/index.php?/runs/view/734) Testrun on latest 8.0.0-SNAPSHOT Kibana cloud environment with commit 5d71db0427f173641a52a254ffab2e8d7b3a86ea and 8.0.0 agent hash b6e27da5 (say https://snapshots.elastic.co/8.0.0-b6e27da5/downloads/beats/elastic-agent/elastic-agent-8.0.0-SNAPSHOT-amd64.deb) .
We have failed 01 testcase due to defect https://github.com/elastic/beats/issues/21424 and blocked 03 due to below queries :-
Observations/ Queries:
We have reported bug #21744 for the same.
agent comes online under Agents tab but error is displayed on running 'systemctl enable elastic-agent' and 'systemctl start elastic-agent' command on Linux deb :-
(ii) Then after executing 'sudo reboot' command on Linux deb , Linux deb agent does not come online
Please confirm the testcase steps and expected behavior.
agent comes online under Agents tab but no activity logs are displayed. Moreover after some time, agent goes to Offline status.
(ii)Moreover after reboot, agent is still in Offline status with no activity logs.
Could you please look into above queries and share your feedback.
Hi - this is a response to a week+ old post, but @rahulgupta-qasource I think the concerns are taken care of or are logged separately. The .rpm / .deb usage is not yet sorted, but otherwise the coming BC3 should be fairly solid! So we can do a smoke test and re-validate this and more.
Overview
With the new self-upgrading Elastic Agent work, installation/uninstallation of Elastic Agent needs to be adjusted. This issue documents the work to be done to add 2 new sub-commands
install
anduninstall
.DEB/RPM Packaging & Docker
DEB/RPM packaging and Docker images will not support auto-upgrading. The fact that they cannot be upgraded will be reported to Fleet. Standard DEB/RPM and Docker image upgrade process must be used when that method of consumption is used.
Using the
install
sub-command on Linux will allow Elastic Agent to be auto-upgraded. Linux will be the only OS that supports both auto-upgrade and standard version installation.Install Command
This command will perform the proper installation of Elastic Agent on the OS. Depending on the OS the installation path of Elastic Agent will change, but the internal structure of the Elastic Agent's directories remain the same.
The
install
command will be interactive asking the correct questions so Elastic Agent operates the way the user expects once installed. The interactive part can be skipped by using-f
in which case the default options will always be taken and no questions will be asked.Fresh Installation Interactive
Below documents the flow of installation Elastic Agent and the interactive questions that will be asked.
Fresh Installation Non-Interactive
Below documents the installation of Elastic Agent non-interactive.
Installation Upgrade
Below documents the flow of running the installation of Elastic Agent, but there is already an older version of Elastic Agent installed.
Installation Locations
Below documents the installation locations based on the type of OS. These paths will be hard-coded and will not be possible to be adjusted for installation. The running Elastic Agent will check that its running from the defined directories to enable self-upgrading.
Linux:
Mac:
Windows:
Self-upgrading
Self-upgrading will only be enabled when Elastic Agent is installed. Starting Elastic Agent from an extracted .tar.gz(.zip) without it being installed, will mark the Elastic Agent as not being able to be self-upgraded. This information will be reported to Fleet to inform Fleet that it cannot be upgraded.
Uninstall Command
The command can only be ran by an installed Elastic Agent. Running it from an Elastic Agent will report a message informing the user to run it against the installed Elastic Agent, if one is installed.
This command will inform the current running Elastic Agent that it's going to be uninstalled, so the Elastic Agent can un-installed Elastic Endpoint Security if its installed/running and stop the running Beats. Once the running Elastic Agent has exited and stopped, then the installation files will be removed.
This command is also interactive, unless the
-f
is used.Uninstall Interactive
Uninstall Non-interactive
Uninstall from extracted archive