apache / hop

Hop Orchestration Platform
https://hop.apache.org/
Apache License 2.0
940 stars 344 forks source link

[Bug]: multiple pipeline executors inside pipeline causes error #3503

Closed rcgardne closed 8 months ago

rcgardne commented 9 months ago

Apache Hop version?

2.7.0 (2023-11-17 12.19.09)

Java version?

openjdk 11.0.20 2023-07-18 LTS

Operating system

Windows

What happened?

Priority = 1 because "important component is nonfunctional".

I set up a pipeline (parent pipeline) that runs two other pipelines (sub-pipelines) via the "Pipeline Executor" component. The first sub-pipeline will execute fine if its results are directed to a "Dummy (Do Nothing)" component. However, an error message is observed if the results are directed to the second sub-pipeline. I believe this behavior persists for any data passed from one pipeline executor to another within a parent pipeline. Here is the error:

2023/12/19 02:16:19 - Hop - Projects enabled
2023/12/19 02:16:19 - Hop - Enabling project : 'example_project'
2023/12/19 02:16:44 - parent_pipeline - Executing this pipeline using the Local Pipeline Engine with run configuration 'local'
2023/12/19 02:16:44 - parent_pipeline - Execution started for pipeline [parent_pipeline]
2023/12/19 02:16:44 - start.0 - Finished processing (I=0, O=0, R=0, W=1, U=0, E=0)
2023/12/19 02:16:44 - sub_pipeline_1 - Executing this pipeline using the Local Pipeline Engine with run configuration 'local'
2023/12/19 02:16:44 - sub_pipeline_1 - Execution started for pipeline [sub_pipeline_1]
2023/12/19 02:16:44 - Generate rows.0 - Finished processing (I=0, O=0, R=0, W=10, U=0, E=0)
2023/12/19 02:16:44 - Copy rows to result.0 - Finished processing (I=0, O=0, R=10, W=10, U=0, E=0)
2023/12/19 02:16:44 - sub_pipeline_1 - Pipeline duration : 0.107 seconds [  0.107" ]
2023/12/19 02:16:44 - sub_pipeline_1 - Execution finished on a local pipeline engine with run configuration 'local'
2023/12/19 02:16:44 - sub_pipeline_2 - Executing this pipeline using the Local Pipeline Engine with run configuration 'local'
2023/12/19 02:16:44 - sub_pipeline_2 - Execution started for pipeline [sub_pipeline_2]
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 - ERROR: Unexpected error
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 - ERROR: org.apache.hop.core.exception.HopException: 
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 - There was an unexpected error:
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 - class java.lang.Class cannot be cast to class java.lang.reflect.ParameterizedType (java.lang.Class and java.lang.reflect.ParameterizedType are in module java.base of loader 'bootstrap')
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 - 
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    at org.apache.hop.pipeline.transforms.pipelineexecutor.PipelineExecutor.processRow(PipelineExecutor.java:158)
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    at org.apache.hop.pipeline.transform.RunThread.run(RunThread.java:55)
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    at java.base/java.lang.Thread.run(Thread.java:829)
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 - Caused by: java.lang.ClassCastException: class java.lang.Class cannot be cast to class java.lang.reflect.ParameterizedType (java.lang.Class and java.lang.reflect.ParameterizedType are in module java.base of loader 'bootstrap')
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    at org.apache.hop.pipeline.transform.BaseTransformMeta.createTransformData(BaseTransformMeta.java:131)
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    at org.apache.hop.pipeline.Pipeline.prepareExecution(Pipeline.java:823)
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    at org.apache.hop.pipeline.engines.local.LocalPipelineEngine.prepareExecution(LocalPipelineEngine.java:236)
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    at org.apache.hop.pipeline.transforms.pipelineexecutor.PipelineExecutor.executePipeline(PipelineExecutor.java:254)
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    at org.apache.hop.pipeline.transforms.pipelineexecutor.PipelineExecutor.processRow(PipelineExecutor.java:152)
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    ... 2 more
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 - Finished processing (I=0, O=0, R=1, W=0, U=0, E=1)
2023/12/19 02:16:44 - parent_pipeline - Pipeline detected one or more transforms with errors.
2023/12/19 02:16:44 - parent_pipeline - Pipeline is killing the other transforms!
2023/12/19 02:16:44 - sub_pipeline_1.hpl.0 - Finished processing (I=0, O=0, R=1, W=10, U=0, E=0)
2023/12/19 02:16:44 - parent_pipeline - Pipeline duration : 0.465 seconds [  0.465" ]
2023/12/19 02:16:44 - parent_pipeline - Execution finished on a local pipeline engine with run configuration 'local'

An exactly analogous setup in Spoon (General Availability Release - 9.4.0.0-343) throws no errors. I've attached a compressed .zip file of the example: example.zip. Note that to observe the behavior you should:

  1. unzip the example.zip.
  2. run hop-gui.bat.
  3. open the ../example/parent_pipeline.hpl pipeline.
  4. ensure that the paths to sub_pipeline_1.hpl and sub_pipeline_2.hpl resolve correctly within the respective Pipeline Executor components.
  5. preview rows for the Dummy (Do Nothing) component - observe that there are no errors.
  6. disable the hop between sub_pipeline_1.hpl and Dummy (Do Nothing).
  7. DELETE and then reestablish a hop between sub_pipeline_1.hpl and sub_pipeline_2.hpl, choosing option = This output will contain the result rows after execution.
  8. enable the hop between sub_pipeline_2.hpl and Dummy (Do Nothing).
  9. again preview rows for the Dummy (Do Nothing) component - observe the error from above.

Issue Priority

Priority: 1

Issue Component

Component: Pipelines

bamaer commented 9 months ago

confirmed, thanks for the detailed example and reproduction path!

nadment commented 9 months ago

I've found the reason for the bug in the BaseTransformMeta.createTransformData() method for RecordsFromStreamMeta, so the code needs to be strengthened for this reflexive operation.

nadment commented 9 months ago

As a workaround, you can replace the "Get records from stream" transformation with "Get Rows from Result" in sub_pipeline_2.