Open alex-stoica opened 1 month ago
It seems that haystack
does not support parallel execution. I spent time reading the document but currently, there is no solution.
btw, @alex-stoica, could you tell me how to visualize the pipeline after executing?
@Quang-elec44, regarding the visualization, the connection between the components should stay the same. However, the execution starting time must be much closer in independent components from the same level
Now, Haystack has a Cookbook (not native support) for a workaround - https://haystack.deepset.ai/cookbook/concurrent_tasks This tutorial shows how to group together components that we think they should execute concurrently. Multiple issues might occur from here:
@alex-stoica Yeah, I read the tutorial but didn't find it useful. I think Haystack lacks dynamic/parallel graph execution, so the team needs to work more on this. Currently, I switch to langgraph
since they support concurrent tasks very well.
I see your point. While it’s not a major issue for me, I was surprised to see this happen. This underscores why graph-based execution is often preferred. If the graph (or pipeline) runs synchronously, the benefits over traditional single-threaded, top-down execution are minimal. I understand that the pipelines built with Haystack aids in visualization and tracking I/O for each component, but execution-wise, there's no real advantage
If a component is not blocked by explicit inputs from another node, it should run concurrently with other components to optimize pipeline execution. This unnecessary waiting behavior reduces pipeline performance.
For example, in a pipeline like
A and B should run concurrently, as they have no dependencies on each other. D and E should also run concurrently, since neither is dependent on the other
However, in practice, the following behavior occurs:
You can replicate this behavior using any components. For my tests, I used the following: