DataDog / ansible-datadog

Ansible role for Datadog Agent
Apache License 2.0
297 stars 222 forks source link

Add version pinning and telemetry for APM tracer libraries #541

Closed schneideradam closed 5 months ago

schneideradam commented 7 months ago

Telemetry to send analytics to datadog for tracer library injection. Version pinning for specific tracer libraries.

Reference APMON-356

schneideradam commented 6 months ago

@jonbodner Feel free to merge this when you have a final look. Thanks!

schneideradam commented 6 months ago

@bkabrda - I've addressed all of your comments. Please take a look and let me know your thoughts. Thanks.

schneideradam commented 6 months ago

@bkabrda Ready for review again. Resolved all of your comments and re-tested in both debian and rhel environments.

bkabrda commented 6 months ago

@schneideradam nice work, I think the code looks really good now. You will need to make sure that the CI is fixed, however. In the meanwhile (some of the failures are very likely related to certain CI fixes that we've done recently, so feel free to rebase on the latest main branch, that should help a lot).

I'm going to do some local testing and play around with the feature a little bit to see if everything is 100 % and let you know if I find more places to improve.

schneideradam commented 6 months ago

@bkabrda Thanks. I think the main failure is coming from the guidance you gave to modify the deprecation alert message to include fail, when, and ignore_errors. I think the linter doesn't like those. https://github.com/DataDog/ansible-datadog/pull/541#discussion_r1495395668. Not sure how to fix this as its what you guided to doing. It seems I could write a custom module but that seems like overkill. Thoughts?

bkabrda commented 5 months ago

@schneideradam please use failed_when: false instead of ignore_errors: true. That should fix it.

bkabrda commented 5 months ago

@schneideradam some tests in the CI are still failing. For example, I'm seeing this error:

fatal: [127.0.0.1]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'agent_dd_apm_install_pkgs' is undefined\n\nThe error appears to be in '/root/project/tasks/pkg-redhat.yml': line 96, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n    - name: Set plain includepkgs variable\n      ^ here\n"}

Can you please make sure CI is green?

schneideradam commented 5 months ago

@bkabrda CI Passing. Let me know if there's anything else here.

bkabrda commented 5 months ago

Hi @schneideradam, I did some more testing and found some issues with the pinning logic. My reproducer is:

On Debian/Ubuntu, this errors out:

failed: [localhost] (item=datadog-apm-library-python) => {"ansible_loop_var": "item", "cache_update_time": 1678760944, "cache_updated": false, "changed": false, "item": "datadog-apm-library-python", "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"       install 'datadog-apm-library-python=2.6.5-1'' failed: E: Held packages were changed and -y was used without --allow-change-held-packages.\n", "rc": 100, "stderr": "E: Held packages were changed and -y was used without --allow-change-held-packages.\n", "stderr_lines": ["E: Held packages were changed and -y was used without --allow-change-held-packages."], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nThe following held packages will be changed:\n  datadog-apm-library-python\nThe following packages will be upgraded:\n  datadog-apm-library-python\n1 upgraded, 0 newly installed, 0 to remove and 163 not upgraded.\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "The following held packages will be changed:", "  datadog-apm-library-python", "The following packages will be upgraded:", "  datadog-apm-library-python", "1 upgraded, 0 newly installed, 0 to remove and 163 not upgraded."]}

On CentOS/RHEL, this doesn't fail, but only upgrades the unpinned tracer after Ansible is executed twice in a row (the first time the repofile gets updated after the installation, on the second pass this allows yum/dnf to actually upgrade).

For Debian, you should follow the install_script logic, which explicitly unpins first, then installs and then pins. For CentOS/RHEL, you should write the repofile first before doing any install/upgrade, because that will allow the upgrade to happen correctly.

I think this is the last bit that we're missing. Thanks!

bkabrda commented 5 months ago

@schneideradam unfortunately there are still issues with the new logic on both Ubuntu and CentOS, let me explain:

Does that make sense? Sorry that this is dragging for so long, but we need to get this right to ensure good customer experience.

schneideradam commented 5 months ago

Hey, sorry I wasn't quite ready for a review. I was testing on a remote machine and I had to submit commits to send it over. I'll ping you when its ready for review. I'm looking into the issues, and yes I think there's a regression where RHEL is no longer grabbing the package version. I'll get this fixed up and let you know when ready. Sorry for all of the back and forth.

On Wed, Mar 6, 2024 at 3:30 AM Slavek Kabrda @.***> wrote:

@schneideradam https://github.com/schneideradam unfortunately there are still issues with the new logic on both Ubuntu and CentOS, let me explain:

  • On Ubuntu, you can't use dpkg_selections before the package is installed locally, so this fails when first installing packages. I think probably the easiest thing we can do here is use failed_when: false on the task (the other way would be to do this only for packages that already have an entry in the dpkg --get-selections output, but that's probably too overcomplicated).
  • On CentOS/RHEL, the packages don't get pinned - running a dnf update still updates them. The reason is that the includepkgs line looks something like this: includepkgs = datadog-agent datadog-apm datadog-apm-library-js-4.11.1-1 datadog-apm-library-ruby-1.20.0-1 - the datadog-apm is extraneous in this case and will override the pin that comes later. In case any tracer packages are explicitly enumerated, we should not use this pin, but rather list them instead of using the datadog-apm* glob (and not forget to add datadog-apm-inject).

Does that make sense? Sorry that this is dragging for so long, but we need to get this right to ensure good customer experience.

— Reply to this email directly, view it on GitHub https://github.com/DataDog/ansible-datadog/pull/541#issuecomment-1980329301, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIEAPBGXEBGMSXIPYSYRDTYW3H2HAVCNFSM6AAAAABCCSABC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBQGMZDSMZQGE . You are receiving this because you were mentioned.Message ID: @.***>

schneideradam commented 5 months ago

@bkabrda This is now ready for review. I have tested on both RHEL and Debian.