DataDog / kong-plugin-ddtrace

Datadog APM Plugin for Kong Gateway
Apache License 2.0
15 stars 7 forks source link

[Bug]: Breaking change between 0.1 and 0.2 because of deprecated config #52

Open lays147 opened 5 months ago

lays147 commented 5 months ago

Kong Version

3.6.1

Plugin Version

0.2.0-1

On which environment your Kong instance is running?

Docker

Plugin Configuration

_format_version: "3.0"

_info:
  defaults: {}
  select_tags:
    - ddtrace

plugins:
  - name: ddtrace
    tags:
      - ddtrace
    config:
      environment: prd

What happened?

The deprecation of the field agent_endpoint causes a breaking change that causes Kong to fail when the latest version of this plugin is added to the docker image.

Since the plugin is configured based on the 0.1.2 version, if I deploy the latest version, Kong fails to read the plugins configuration from the Control Plane, so the Data Plane doesn't load all the configuration for services and gateways.

The Control Plane isn't accessible outside my infrastructure, and I can't access it, so I'm unable to fix the configuration or disable this plugin for further fixing.

 [error] 1361#0: *7 [lua] data_plane.lua:263: [clustering] unable to update running config: bad config received from control plane in 'plugins':
  - in entry 3 of 'plugins':
  in 'config':
  in 'agent_endpoint': agent_endpoint is deprecated. Please use trace_agent_url or agent_host instead, context: ngx.timer

This plugin should handle the breaking change gracefully without impacting Kong start.

dmehala commented 5 months ago

Hi @lays147 ,

Thank you for reaching out. Upon review, it appears that the deprecation of agent_endpoint field was not adequately categorized in the release note (Updated since then, thank you!). We will do our best to ensure better communication in future updates.

Regarding your current situation, while reverting the deprecation of the agent_endpoint field may seem like a straightforward solution, it's important to consider the long-term implications in term of technical debt, support, etc. Simply accepting the deprecated value with a warning would temporarily hide the "good way" to configure the plugin. Relying on deprecated configurations isn't a sustainable approach as it will (by experience) inevitably lead to similar issue once the change to the new configuration will be effective.

In this scenario, I would recommend temporarily downgrading to v0.1.2 to restore functionality while preparing for the eventual upgrade to v0.2.0.

lays147 commented 5 months ago

@dmehala thanks for the feedback. But I don't see how can I upgrade to v0.2.0 and fix the configuration without the access to the control plane. The current access that I have is through the data plane that doesn't configure correctly because of the misconfiguration of the ddtrace plugin. So, I see myself in a deadlock. (For compliance reasons, I don't have access to a bastion host that access the control plane directly)

What would be the best way to handle this upgrade in this scenario? Disabling the plugin, making the update, and then enabling it later with the right config of the agent endpoint?

dmehala commented 5 months ago

What would be the best way to handle this upgrade in this scenario? Disabling the plugin, making the update, and then enabling it later with the right config of the agent endpoint?

Yes. That's one option. Alternatively, as suggested, you could temporarily downgrade to v0.1.2 until you can update the configuration properly to avoid losing observability to your Kong instance.

lays147 commented 5 months ago

@dmehala I erased all the plugin configuration to avoid any issues with the upcoming upgrade.

I think that we can't avoid the deletion of the plugin configuration, because even with it disabled, the legacy config would be read, and the problem would persist, because kong would try to validate the config (I'm not 100% sure of this, but I think that the validation of the plugin schema would be read even with the plugin disabled).

I think that a proper documentation on how to upgrade between these versions is needed, to avoid others to have the same issue that I had.

Steps that I did:

  1. Deck diff and sync to delete all the ddtrace configuration
  2. Deployed the latest container of kong and the ddtrace plugin
  3. Use deck to sync the new configuration of ddtrace

And no downtime on the DP again.