deploy dolly langchain demo to mlflow issue

Hello all,

I've been trying to follow the databricks dolly-3b model demo and deploy a langchain pipeline to mlflow. While that part works successfully, when I reload it (using the code provided on the demo) after downloading it using the mlflow pyfunc class and try to get a prediction, I always seem to get an error, which is

2023/08/11 11:22:37 WARNING mlflow.langchain.api_request_parallel_processor: Request #0 failed with AttributeError("'tuple' object has no attribute 'page_content'")```

I am a bit confused by the error as I am entering a document format for langchain, which should not be a tuple. I am posting the code and the environment details below.

import torch
from langchain import PromptTemplate
from langchain.llms import HuggingFacePipeline
from langchain.chains.question_answering import load_qa_chain

def build_qa_chain():
  torch.cuda.empty_cache()
  model_name = "databricks/dolly-v2-3b" # can use dolly-v2-3b or dolly-v2-7b for smaller model and faster inferences.

  # Increase max_new_tokens for a longer response
  # Other settings might give better results! Play around
  instruct_pipeline = pipeline(model=model_name, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto", 
                               return_full_text=True, max_new_tokens=256, top_p=0.95, top_k=50)
  # Note: if you use dolly 12B or smaller model but a GPU with less than 24GB RAM, use 8bit. This requires %pip install bitsandbytes
  # instruct_pipeline = pipeline(model=model_name, trust_remote_code=True, device_map="auto", model_kwargs={'load_in_8bit': True})
  # For GPUs without bfloat16 support, like the T4 or V100, use torch_dtype=torch.float16 below
  # model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.float16, trust_remote_code=True)

  # Defining our prompt content.
  # langchain will load our similar documents as {context}
  template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

  Instruction: 
  You are a gardener and your job is to help providing the best gardening answer. 
  Use only information in the following paragraphs to answer the question at the end. Explain the answer with reference to these paragraphs. If you don't know, say that you do not know.

  {context}

  Question: {question}

  Response:
  """
  prompt = PromptTemplate(input_variables=['context', 'question'], template=template)

  hf_pipe = HuggingFacePipeline(pipeline=instruct_pipeline)
  # Set verbose=True to see the full prompt:
  return load_qa_chain(llm=hf_pipe, chain_type="stuff", prompt=prompt, verbose=True)

def publish_model_to_mlflow():
  # Build our langchain pipeline
  langchain_model = build_qa_chain()
#   input_schema = Schema([
#     ColSpec("double", "input_documents")
# ])
#   output_schema = Schema([
#     ColSpec("double", "output_text")
# ])
  with mlflow.start_run() as run:
      # Save model to MLFlow
      # Note that this only saves the langchain pipeline (we could also add the ChatBot with a custom Model Wrapper class)
      # See https://mlflow.org/docs/latest/models.html#custom-python-models for an example
      # The vector database lives outside of your model

      #Note: for now only LLMChain model are supported, qaChain will be added soon

      model_signature=ModelSignature(inputs=input_schema, outputs=output_schema)
      print(model_signature)
      mlflow.langchain.log_model(langchain_model, artifact_path="model")
      print(run.info)
      model_registered = mlflow.register_model(f"runs:/{run.info.run_id}/model", "compliance-bot")

  # Move the model in production
  client = mlflow.tracking.MlflowClient()
  print(model_registered)
  print("registering model version "+model_registered.version+" as production model")
  client.transition_model_version_stage("compliance-bot", model_registered.version, stage = "Production", archive_existing_versions=True)

def load_model_and_answer(similar_docs, question):
    model_uri='runs:/2f24b734587d41a3a68891d2f1b85c2d/model'
    # similar_docs = get_similar_docs(question, similar_doc_count=1)
    chain = mlflow.pyfunc.load_model(model_uri)
    chain.predict({"input_documents": similar_docs, "human_input": question})

question = "What is the email address for this?"
hf_embed = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
Chroma_db = Chroma(collection_name="sample_docs", embedding_function=hf_embed, persist_directory=gardening_vector_db_path)
similar_docs = Chroma_db.similarity_search(question, k=1)
similar_docs
load_model_and_answer(similar_docs, question)

The error occurs at this line of code

load_model_and_answer(similar_docs, question)

I tried asking on the dolly repo and the databricks site, but didn't seem to get a response, so I am posting here.

Environment: chromadb==0.3.22 langchain==0.0.199 transformers==4.29.0 accelerate==0.19.0 bitsandbytes mlflow ==2.5 Databricks Runtime=13.0 ML (includes Apache Spark 3.4.0, Scala 2.12)

If it is an issue with the input format, then I would greatly appreciate what the proper format should look like.

In addition, I have encountered the same issue with the dollybricks demo, specifically part 4 where we deploy the model to mlflow and then reload it.

databricks-demos / dbdemos

deploy dolly langchain demo to mlflow issue #61