ConduitIO / conduit

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
https://conduit.io
Apache License 2.0
399 stars 49 forks source link

Failing to build a pipeline from a config. file results shows "processor already running" in logs #1852

Open hariso opened 2 months ago

hariso commented 2 months ago

Bug description

failed to start pipeline: could not build nodes for pipeline test-pipeline: could not build source nodes: could not build processor nodes for connector test-pipeline:test-connector: processor already running

Steps to reproduce

  1. Specify an older version of a built-in connector in a pipeline config file, e.g. builtin:kafka@v0.7.0
  2. Run pipeline

Version

v0.11.1

hariso commented 2 months ago

Before the error above, the following error can be seen:

"pipeline \"test-pipeline-builtin-processor\", error while provisioning: could not start the pipeline \"test-pipeline-builtin-processor\": could not build nodes for pipeline test-pipeline-builtin-processor: could not build destination nodes: failed to get plugin dispenser: could not find builtin plugin \"builtin:file@v0.6.0\", only found versions [v0.7.0 latest]: plugin not found"

After debugging the issue, I found it actually comes to "processor already running":

  1. Conduit loads a functioning pipeline from a configuration file. The pipeline is also saved to the database.
  2. Conduit is stopped.
  3. Invalid changes are made to the pipeline configuration file (e.g. an invalid destination plugin version).
  4. Conduit is started again.
  5. The provisioning service tries running the pipeline that's in the configuration file.
  6. Firstly, the source nodes are built, then the processor nodes, and then the destination nodes.
  7. Building the destination nodes fails (because the plugin doesn't exist).
  8. The processor nodes have already been built (this will become important later).
  9. The pipeline cannot be run, and the provisioning service is done with its work.
  10. After that, the pipeline service runs, and tries running the pipelines in the database.
  11. The pipeline service tries to run the same pipeline from above.
  12. The processor nodes have already been built, so it fails because the processor is already running.

Here I believe we have two issues:

  1. Nodes are not properly cleaned up when a pipeline cannot be built.
  2. The pipeline service tries to re-run the pipelines that the provisioning already tried running.