apache / hop

Hop Orchestration Platform
https://hop.apache.org/
Apache License 2.0
983 stars 351 forks source link

[Bug]: multiple pipeline executors inside pipeline causes error #3503

Closed rcgardne closed 9 months ago

rcgardne commented 11 months ago

Apache Hop version?

2.7.0 (2023-11-17 12.19.09)

Java version?

openjdk 11.0.20 2023-07-18 LTS

Operating system

Windows

What happened?

Priority = 1 because "important component is nonfunctional".

I set up a pipeline (parent pipeline) that runs two other pipelines (sub-pipelines) via the "Pipeline Executor" component. The first sub-pipeline will execute fine if its results are directed to a "Dummy (Do Nothing)" component. However, an error message is observed if the results are directed to the second sub-pipeline. I believe this behavior persists for any data passed from one pipeline executor to another within a parent pipeline. Here is the error:

2023/12/19 02:16:19 - Hop - Projects enabled
2023/12/19 02:16:19 - Hop - Enabling project : 'example_project'
2023/12/19 02:16:44 - parent_pipeline - Executing this pipeline using the Local Pipeline Engine with run configuration 'local'
2023/12/19 02:16:44 - parent_pipeline - Execution started for pipeline [parent_pipeline]
2023/12/19 02:16:44 - start.0 - Finished processing (I=0, O=0, R=0, W=1, U=0, E=0)
2023/12/19 02:16:44 - sub_pipeline_1 - Executing this pipeline using the Local Pipeline Engine with run configuration 'local'
2023/12/19 02:16:44 - sub_pipeline_1 - Execution started for pipeline [sub_pipeline_1]
2023/12/19 02:16:44 - Generate rows.0 - Finished processing (I=0, O=0, R=0, W=10, U=0, E=0)
2023/12/19 02:16:44 - Copy rows to result.0 - Finished processing (I=0, O=0, R=10, W=10, U=0, E=0)
2023/12/19 02:16:44 - sub_pipeline_1 - Pipeline duration : 0.107 seconds [  0.107" ]
2023/12/19 02:16:44 - sub_pipeline_1 - Execution finished on a local pipeline engine with run configuration 'local'
2023/12/19 02:16:44 - sub_pipeline_2 - Executing this pipeline using the Local Pipeline Engine with run configuration 'local'
2023/12/19 02:16:44 - sub_pipeline_2 - Execution started for pipeline [sub_pipeline_2]
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 - ERROR: Unexpected error
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 - ERROR: org.apache.hop.core.exception.HopException: 
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 - There was an unexpected error:
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 - class java.lang.Class cannot be cast to class java.lang.reflect.ParameterizedType (java.lang.Class and java.lang.reflect.ParameterizedType are in module java.base of loader 'bootstrap')
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 - 
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    at org.apache.hop.pipeline.transforms.pipelineexecutor.PipelineExecutor.processRow(PipelineExecutor.java:158)
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    at org.apache.hop.pipeline.transform.RunThread.run(RunThread.java:55)
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    at java.base/java.lang.Thread.run(Thread.java:829)
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 - Caused by: java.lang.ClassCastException: class java.lang.Class cannot be cast to class java.lang.reflect.ParameterizedType (java.lang.Class and java.lang.reflect.ParameterizedType are in module java.base of loader 'bootstrap')
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    at org.apache.hop.pipeline.transform.BaseTransformMeta.createTransformData(BaseTransformMeta.java:131)
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    at org.apache.hop.pipeline.Pipeline.prepareExecution(Pipeline.java:823)
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    at org.apache.hop.pipeline.engines.local.LocalPipelineEngine.prepareExecution(LocalPipelineEngine.java:236)
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    at org.apache.hop.pipeline.transforms.pipelineexecutor.PipelineExecutor.executePipeline(PipelineExecutor.java:254)
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    at org.apache.hop.pipeline.transforms.pipelineexecutor.PipelineExecutor.processRow(PipelineExecutor.java:152)
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 -    ... 2 more
2023/12/19 02:16:44 - sub_pipeline_2.hpl.0 - Finished processing (I=0, O=0, R=1, W=0, U=0, E=1)
2023/12/19 02:16:44 - parent_pipeline - Pipeline detected one or more transforms with errors.
2023/12/19 02:16:44 - parent_pipeline - Pipeline is killing the other transforms!
2023/12/19 02:16:44 - sub_pipeline_1.hpl.0 - Finished processing (I=0, O=0, R=1, W=10, U=0, E=0)
2023/12/19 02:16:44 - parent_pipeline - Pipeline duration : 0.465 seconds [  0.465" ]
2023/12/19 02:16:44 - parent_pipeline - Execution finished on a local pipeline engine with run configuration 'local'

An exactly analogous setup in Spoon (General Availability Release - 9.4.0.0-343) throws no errors. I've attached a compressed .zip file of the example: example.zip. Note that to observe the behavior you should:

  1. unzip the example.zip.
  2. run hop-gui.bat.
  3. open the ../example/parent_pipeline.hpl pipeline.
  4. ensure that the paths to sub_pipeline_1.hpl and sub_pipeline_2.hpl resolve correctly within the respective Pipeline Executor components.
  5. preview rows for the Dummy (Do Nothing) component - observe that there are no errors.
  6. disable the hop between sub_pipeline_1.hpl and Dummy (Do Nothing).
  7. DELETE and then reestablish a hop between sub_pipeline_1.hpl and sub_pipeline_2.hpl, choosing option = This output will contain the result rows after execution.
  8. enable the hop between sub_pipeline_2.hpl and Dummy (Do Nothing).
  9. again preview rows for the Dummy (Do Nothing) component - observe the error from above.

Issue Priority

Priority: 1

Issue Component

Component: Pipelines

bamaer commented 11 months ago

confirmed, thanks for the detailed example and reproduction path!

nadment commented 10 months ago

I've found the reason for the bug in the BaseTransformMeta.createTransformData() method for RecordsFromStreamMeta, so the code needs to be strengthened for this reflexive operation.

nadment commented 10 months ago

As a workaround, you can replace the "Get records from stream" transformation with "Get Rows from Result" in sub_pipeline_2.