deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.98k stars 1.86k forks source link

Non-RAG model as Generator for "Generative Q&A with RAG" #1409

Closed gianlucabusatta closed 3 years ago

gianlucabusatta commented 3 years ago

About the tutorial "Generative QA with "Retrieval-Augmented Generation", is there a way to use a non-RAG model as generator?

E.g. I would like to use a custom mBART.

For instance, using "facebook/bart-base" in the generator

retriever = DensePassageRetriever(
    document_store=document_store,
    query_embedding_model="voidful/dpr-question_encoder-bert-base-multilingual",
    passage_embedding_model="voidful/dpr-ctx_encoder-bert-base-multilingual",
    use_gpu=True,
    embed_title=True,
)

generator = RAGenerator(
    model_name_or_path="facebook/bart-base",
    use_gpu=True,
    top_k=1,
    max_length=200,
    min_length=2,
    embed_title=True,
    num_beams=2,
)

raises an error: You are using a model of type bart to instantiate a model of type rag. This is not supported for all configurations of models and can yield errors. AssertionError: Config has to be initialized with question_encoder and generator config.

I don't understand why do I need to put a RAG model in here if RAG is basically a Retriever (DPR) + Generator (BART). Shouldn't I be able to just put the generator since the retriever is already defined?

julian-risch commented 3 years ago

Hi @gianlucabusatta the warning and the error message that you report are raised by the underlying transformers library when RAGenerator tries to load "facebook/bart-base" as a RAG model: https://github.com/huggingface/transformers/blob/76c4d8bf26de3e4ab23b8afeed68479c2bbd9cbd/src/transformers/models/rag/configuration_rag.py#L132 As the error message states, the model cannot be loaded because it does not provide a question_encoder (among other things). For "facebook/bart-base", there is actually a different class called MBartForConditionalGeneration in transformers.

The implementation in haystack is fixed to RAG models and the model "facebook/rag-token-nq" used in the tutorial indicates RagTokenForGeneration architecture in its config file. In contrast to that, "facebook/bart-base" indicates a BartModel architecture in its config file.

julian-risch commented 3 years ago

Transformers' RagTokenForGeneration can be initialized with a retriever or a separate retriever can be used which forwards retrieved documents to the RAG model. Here is the documentation page: https://huggingface.co/transformers/model_doc/rag.html The generator within transformers' RAG model could be set to a BartForConditionalGeneration model, which is what you would like to do, I think: https://huggingface.co/transformers/model_doc/rag.html#ragmodel Maybe @lalitpagaria can explain it better?

julian-risch commented 3 years ago

If you would like to replace the generator, keep in mind that the retriever and generator should be fine-tuned jointly to get good results.

gianlucabusatta commented 3 years ago

Thank you for the answers @julian-risch. I'm trying to follow your suggestions.

from transformers import RagTokenForGeneration, AutoTokenizer
model = RagTokenForGeneration.from_pretrained_question_encoder_generator("voidful/dpr-question_encoder-bert-base-multilingual", "facebook/mbart-large-cc25")
model.save_pretrained('/content/RagTokenForGeneration/')

question_encoder_tokenizer = AutoTokenizer.from_pretrained("voidful/dpr-question_encoder-bert-base-multilingual")
generator_tokenizer = AutoTokenizer.from_pretrained("facebook/mbart-large-cc25")
question_encoder_tokenizer.save_pretrained('/content/RagTokenForGeneration/question_encoder_tokenizer')
generator_tokenizer.save_pretrained('/content/RagTokenForGeneration/generator_tokenizer')

Here I'm saving the model and tokenizers in the "/content/RagTokenForGeneration/" directory following the same structure as in huggingface rag-token like this image

generator = RAGenerator(
    model_name_or_path="/content/RagTokenForGeneration/",
    use_gpu=True,
    top_k=1,
    max_length=200,
    min_length=2,
    embed_title=True,
    num_beams=2
)

From now on the rest is equal to the tutorial.

My doubs are:

  1. Does it make sense to do something like this?
  2. How fine-tune retriever and generator jointly?
julian-risch commented 3 years ago

Honestly, I am not so sure whether that plan will work. For fine-tuning, there is an example here: https://github.com/huggingface/transformers/tree/master/examples/research_projects/rag The idea of a multilingual RAG is definitely interesting but it's also a quite advanced topic. Unfortunately, I think that the available, pre-trained models are not suitable for this task.

gianlucabusatta commented 3 years ago

With the above setting (the rest is equal to the tutorial), when generating the answer

# Now generate an answer for each question
for question in QUESTIONS:
    # Retrieve related documents from retriever
    retriever_results = retriever.retrieve(
        query=question
    )

    # Now generate answer from question and retrieved documents
    predicted_result = generator.predict(
        query=question,
        documents=retriever_results,
        top_k=1
    )

    # Print you answer
    answers = predicted_result["answers"]
    print(f'Generated answer is \'{answers[0]["answer"]}\' for the question = \'{question}\'')

I got this error:

OverflowError                             Traceback (most recent call last)
<ipython-input-10-9538ef81670e> in <module>()
     10         query=question,
     11         documents=retriever_results,
---> 12         top_k=1
     13     )
     14 

6 frames
/usr/local/lib/python3.7/dist-packages/haystack/generator/transformers.py in predict(self, query, documents, top_k)
    242         input_dict = self.tokenizer.prepare_seq2seq_batch(
    243             src_texts=[query],
--> 244             return_tensors="pt"
    245         )
    246         input_ids = input_dict['input_ids'].to(self.device)

/usr/local/lib/python3.7/dist-packages/transformers/models/rag/tokenization_rag.py in prepare_seq2seq_batch(self, src_texts, tgt_texts, max_length, max_target_length, padding, return_tensors, truncation, **kwargs)
    106             padding=padding,
    107             truncation=truncation,
--> 108             **kwargs,
    109         )
    110         if tgt_texts is None:

/usr/local/lib/python3.7/dist-packages/transformers/models/rag/tokenization_rag.py in __call__(self, *args, **kwargs)
     61 
     62     def __call__(self, *args, **kwargs):
---> 63         return self.current_tokenizer(*args, **kwargs)
     64 
     65     def batch_decode(self, *args, **kwargs):

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in __call__(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
   2303                 return_length=return_length,
   2304                 verbose=verbose,
-> 2305                 **kwargs,
   2306             )
   2307         else:

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in batch_encode_plus(self, batch_text_or_text_pairs, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
   2488             return_length=return_length,
   2489             verbose=verbose,
-> 2490             **kwargs,
   2491         )
   2492 

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_fast.py in _batch_encode_plus(self, batch_text_or_text_pairs, add_special_tokens, padding_strategy, truncation_strategy, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose)
    380             max_length=max_length,
    381             stride=stride,
--> 382             pad_to_multiple_of=pad_to_multiple_of,
    383         )
    384 

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_fast.py in set_truncation_and_padding(self, padding_strategy, truncation_strategy, max_length, stride, pad_to_multiple_of)
    333         # Set truncation and padding on the backend tokenizer
    334         if truncation_strategy != TruncationStrategy.DO_NOT_TRUNCATE:
--> 335             self._tokenizer.enable_truncation(max_length, stride=stride, strategy=truncation_strategy.value)
    336         else:
    337             self._tokenizer.no_truncation()

OverflowError: int too big to convert

Any clue on how to solve this?

julian-risch commented 3 years ago

That might be a problem related to the loaded tokenizer_config.json file and the value of max_length in line 335 of transformers/tokenization_utils_fast.py . I am guessing that the tokenizer.model_max_length is not set, which causes the OverflowError: int too big to convert. Sorry that I cannot provide more help than that.

gianlucabusatta commented 3 years ago

Thank you for the help!

lalitpagaria commented 3 years ago

@gianlucabusatta I would suggest to try this on transformers code itself, at least it will have less variables. Once you managed to make it work on transformers., then I don't think it will be difficult to integrate it with Haystack. RAG implementation in transformers is not standard, it have few tweaks hence it is bit tricky.