deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17k stars 1.86k forks source link

Allow a single node pipeline (e.g. summarizer) to be created as a YAML file - which would allow for Document input. #2266

Closed TuanaCelik closed 1 year ago

TuanaCelik commented 2 years ago

I can imagine a use-case where for example you want to create a single node pipeline with just a Summarizer.

The problem I see here is that the Summarizer expects a list of Documents. Which makes creating a pipeline less intuitive as a pipeline expects 'File' or 'Query' as input.

The code below currently works:

summarizer = TransformersSummarizer(model_name_or_path="google/pegasus-xsum")
pipeline.add_node(name="Summarizer", component=summarizer, inputs=["Query"])
summary = pipeline.run(documents=[Document(request_data['text'])])

But given the Summarizer actually expects a list of documents, the second line there is a bit un-intuitive.

I saved the above pipeline to a YAML file which resulted in the following:

components:
- name: Summarizer
  params: {}
  type: TransformersSummarizer
pipelines:
- name: query
  nodes:
  - inputs:
    - Query
    name: Summarizer
  type: Pipeline
version: 1.2.0

A simple work-around might be to have a pipeline that converts Query to Document?

Basically what I'm trying to achieve is a demo which allows people to paste a piece of text, press play and be given a summary. This works by just calling '.predict()' but doesn't work great if you want to go down the 'pipelines' route.

TuanaCelik commented 2 years ago

Note: When trying to load this pipeline from YAML I get the following error: The node None is not in the graph.

I checked the pipeline by looking at pipeline.graph.nodes which turns up empty:

pipeline.load_from_yaml(Path("./pipeline/summarizer_pipeline.yaml"), pipeline_name='query')
print(f"Loaded pipeline nodes: {pipeline.graph.nodes}")

Result: Loaded pipeline nodes: []

We looked at this with @ZanSara and it seems like this might be resolved with her PR that's coming up

bfgray3 commented 1 year ago

@TuanaCelik i am interested in using a single-node pipeline defined in yaml through the REST API. i dumped my pipeline to yaml and got a similar result as you show above except i had no type: Pipeline. is sending Documents in through the API possible?

ZanSara commented 1 year ago

Addressed by the Pipeline refactoring (https://github.com/deepset-ai/haystack/pull/4284), so closing this thread.