Closed jsirianni closed 3 years ago
One of the breaking changes with 2.0.0 is that we replaced google-cloud-ops-agent.target
with google-cloud-ops-agent.service
: https://github.com/GoogleCloudPlatform/ops-agent/pull/119. See https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/transition#commands for the new commands to use.
Added a compatibility matrix: https://github.com/GoogleCloudPlatform/google-cloud-ops-agents-ansible/pull/65. We'll need to prioritize the work to support Ops Agent 2.0.0 with this role next.
Is conditionally setting the service name based on version the only change that's needed to be compatible with 2.x.x?
I have a fork I am working with right now, the only change I have made is in vars/main.yml
-ops-agent_service_name: google-cloud-ops-agent.target
+ops-agent_service_name: google-cloud-ops-agent
It should also be noted that the ops agent service does not remain running anymore. Its purpose is to start the fluentbit and open telemetry agents, and then exit.
root@agent2:~# systemctl status google-cloud-ops-agent
● google-cloud-ops-agent.service - Google Cloud Ops Agent
Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent.service; enabled; ve
Active: inactive (dead) since Thu 2021-07-01 17:02:49 UTC; 36min ago
Process: 8890 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
Main PID: 8890 (code=exited, status=0/SUCCESS)
Jul 01 17:02:49 agent2 systemd[1]: Starting Google Cloud Ops Agent...
Jul 01 17:02:49 agent2 systemd[1]: google-cloud-ops-agent.service: Succeeded.
Jul 01 17:02:49 agent2 systemd[1]: Started Google Cloud Ops Agent.
The service file looks like this
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
[Unit]
Description=Google Cloud Ops Agent
Requires=google-cloud-ops-agent-fluent-bit.service google-cloud-ops-agent-opentelemetry-collector.service
[Service]
Type=oneshot
ExecStart=/bin/true
[Install]
WantedBy=multi-user.target
I suspect the test cases will need to consider this.
It should also be noted that the ops agent service does not remain running anymore. Its purpose is to start the fluentbit and open telemetry agents, and then exit.
Yes, that's the current behavior. We are still discussing internally to improve that UX. The fact that the root service shows as inactive (dead)
could be confusing to users.
BTW, does Ansible / Puppet have to know the internals of the agents? We added install / upgrade / uninstall / versioning features to the installation scripts, hoping that for cases like this, we only need to update the installation script, and leave the Ansible / Puppet / Chef / Saltstack implementation untouched. The agent restart
case was not supported yet, but we could definitely add it to the installation script. aka it will handle the conditions of which command to use when restarting the agent based on the agent version.
So far, the only change required to Ansible and Puppet have been changing the service name (removing .target). The test cases fail because the service is expected to be running.
I am working on modifying the Ansible test cases to consider the ops-agent major version when deciding which service(s) should be running.
When testing the role against Ops Agent 2.0.0.
When using an older version, 1.0.5, the role completes without error.