delphix / linux-pkg

Framework to build custom packages for the Delphix Appliance
Apache License 2.0
4 stars 31 forks source link

DLPX-86142 [insight] Unknown command fluentd error while creating insight config #288

Closed rasantel closed 1 year ago

rasantel commented 1 year ago

Problem

After upgrading from 10 to 11 when there is an enabled fluentd configuration for a plugin such as [elasticsearch-7.far](https://gitlab.delphix.com/brad.lewis/far-dev/blob/master/elasticsearch-7.far), fluentd and the management stack fail to start.

Diagnosis

The fluentd logs show a failure to find a dependency of `elasticsearch-7`, the gem `faraday` of major version 1; only version 3.x is available. It turns out that this older gem version, which used to be included in the `td-agent` package in 10.0 (version [4.4.2-1](https://s3.amazonaws.com/packages.treasuredata.com/4/ubuntu/focal/pool/contrib/t/td-agent/td-agent_4.4.2-1_amd64.deb)) is no longer available in 11.0 (version [4.5.0-1](https://s3.amazonaws.com/packages.treasuredata.com/4/ubuntu/focal/pool/contrib/t/td-agent/td-agent_4.5.0-1_amd64.deb)). The upgrade of `td-agent` removes that gem, among others. The fluentd container start script copies the gems from td-agent to a new directory, but the stop script deletes that directory, so the old gems are not available anymore.

Solution

Until we decide how to deal with plugins that are missing dependencies after upgrading ([DLPX-86157](https://delphix.atlassian.net/browse/DLPX-86157)) and we let the stack start normally even if a plugin fails ([DLPX-86156](https://delphix.atlassian.net/browse/DLPX-86156)), we'll pin td-agent to its version in 10.0 (version [4.4.2-1](https://s3.amazonaws.com/packages.treasuredata.com/4/ubuntu/focal/pool/contrib/t/td-agent/td-agent_4.4.2-1_amd64.deb)). Specifically, here on `linux-pkg`, we make sure that this older version of `td-agent` is available when building the appliance. Companion app gate review that forces virtualization to depend on this older version: https://github.com/delphix/dlpx-app-gate/pull/728.

Testing Done

Verified manually. In a 11.0 engine, reproduced the issue by uploading the [elasticsearch-7.far](https://gitlab.delphix.com/brad.lewis/far-dev/blob/master/elasticsearch-7.far) plugin and enabling it. Then, I downgraded `td-agent` to the older version and verified that fluentd and the stack start successfully. In progress: appliance build with both changes at http://selfservice.jenkins.delphix.com/job/appliance-build-orchestrator-pre-push/5495/ . Will upload the plugin and enable it when it completes. `git ab-pre-push -b misc-debs`: http://selfservice.jenkins.delphix.com/job/appliance-build-orchestrator-pre-push/5494/
prakashsurya commented 1 year ago

Can you run git-ab-pre-push -b misc-debs and post a link to the run? we don't run this automatically for this repository.. so it'd be good to verify things..

rasantel commented 1 year ago

Does this need to go into release as well?

@prakashsurya I meant to target release. Fixed.

Can you run git-ab-pre-push -b misc-debs and post a link to the run? we don't run this automatically for this repository.. so it'd be good to verify things..

Done. Added link to the testing section.

rasantel commented 1 year ago

Could you also put some information w.r.t. where the debian package that we put into artifactory came from? Just in case we need that information later. E.g. did we copy it from our Ubuntu mirror, and place it in artifactory (no further modifications)? Did we build it from a source package? Something else?

Done