deepset-ai / canals

A component orchestration engine
https://deepset-ai.github.io/canals/
Apache License 2.0
27 stars 2 forks source link

Component is waiting for optional inputs (if there are more than 3) #113

Closed tholor closed 11 months ago

tholor commented 1 year ago

Getting stuck when I declared three optional inputs in the prompt template. The weird thing: The same script worked when only supplying two inputs in the prompt (= leaving out the error_message )

Error message:

DEBUG:canals.pipeline.pipeline:> Queue at step 1: {'prompt_builder': ['query']}
DEBUG:canals.pipeline.pipeline:Component 'prompt_builder' is waiting. Missing inputs: {'error_message'}
Traceback (most recent call last):
  File "/home/mp/deepset/dev/haystack/debug/loops_v2.py", line 81, in <module>
    result = pipeline.run({
             ^^^^^^^^^^^^^^
  File "/home/mp/deepset/dev/haystack/venv/lib/python3.11/site-packages/canals/pipeline/pipeline.py", line 459, in run
    raise PipelineRuntimeError(
canals.errors.PipelineRuntimeError: 'prompt_builder' is stuck waiting for input, but there are no other components to run. This is likely a Canals bug. Open an issue at https://github.com/deepset-ai/canals.

Script to reproduce:

import json
import os

from haystack.preview import Pipeline, Document
from haystack.preview.document_stores import MemoryDocumentStore
from haystack.preview.components.retrievers import MemoryBM25Retriever
from haystack.preview.components.generators.openai.gpt35 import GPT35Generator
from haystack.preview.components.builders.answer_builder import AnswerBuilder
from haystack.preview.components.builders.prompt_builder import PromptBuilder
import random
from haystack.preview import component
from typing import Optional, List

import logging

logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)

@component
class OutputParser():

    @component.output_types(valid=List[str], invalid=Optional[List[str]], error_message=Optional[str])
    def run(
            self,
            replies: List[str]):
        # create a corrupt json with 50% probability (for demo purposes)
        if random.randint(0, 100) < 50:
            replies[0] = "Corrupt Key" + replies[0]
        try:
            json.loads(replies[0])
            print(f"Valid LLM output: {replies[0]}")
            return {"valid": replies, "invalid": None, "error_message": None}
        except ValueError as e:
            print(f"Invalid LLM output: {replies[0]}, error: {e}")
            return {"valid": None, "invalid": replies, "error_message": str(e)}

#TODO let's eventually get rid of this component
@component
class FinalResult():

    @component.output_types(replies=List[str])
    def run(
            self,
            replies: List[str]):
        return {"replies": replies}

query  = ("Create a json file of the 3 biggest cities in the wolrld with the following fields: name, country, and population. "
          "None of the fields must be empty.")

prompt_template = """
 {{query}}
  {% if replies %}
    We already got the following output: {{replies}}
    However, this doesn't comply with the format requirements from above. 
    Correct the output and try again. Just return the corrected output wihtout any extra explanations.
  {% endif %}
  {% if error_message %}
     Error message: {{error_message}}
  {% endif %}
"""

pipeline = Pipeline(max_loops_allowed=5)
pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
pipeline.add_component(instance=GPT35Generator(api_key=os.environ.get("OPENAI_API_KEY")), name="llm")

pipeline.add_component(instance=OutputParser(), name="output_parser")
pipeline.add_component(instance=FinalResult(), name="final_result")

pipeline.connect("prompt_builder", "llm")
pipeline.connect("llm", "output_parser")
pipeline.connect("output_parser.invalid", "prompt_builder.replies")
pipeline.connect("output_parser.error_message", "prompt_builder.error_message")

pipeline.connect("output_parser.valid", "final_result.replies")

## Run the Pipeline
query = (
    "Create a json file of the 3 biggest cities in the world with the following fields: name, country, and population. None of the fields must be empty.")
result = pipeline.run({
    "prompt_builder": {"query": query}
})

print(result)
tholor commented 1 year ago

Note: Does work with the canals FSM (#55)

ZanSara commented 1 year ago

This pipeline runs as expected on https://github.com/deepset-ai/canals/pull/148

Graph:

test_2

I believe this can be considered covered by the same test case that covers https://github.com/deepset-ai/canals/issues/112