Non-RAG model as Generator for "Generative Q&A with RAG"

gianlucabusatta commented 3 years ago

About the tutorial "Generative QA with "Retrieval-Augmented Generation", is there a way to use a non-RAG model as generator?

E.g. I would like to use a custom mBART.

For instance, using "facebook/bart-base" in the generator

retriever = DensePassageRetriever(
    document_store=document_store,
    query_embedding_model="voidful/dpr-question_encoder-bert-base-multilingual",
    passage_embedding_model="voidful/dpr-ctx_encoder-bert-base-multilingual",
    use_gpu=True,
    embed_title=True,
)

generator = RAGenerator(
    model_name_or_path="facebook/bart-base",
    use_gpu=True,
    top_k=1,
    max_length=200,
    min_length=2,
    embed_title=True,
    num_beams=2,
)

raises an error: You are using a model of type bart to instantiate a model of type rag. This is not supported for all configurations of models and can yield errors. AssertionError: Config has to be initialized with question_encoder and generator config.

I don't understand why do I need to put a RAG model in here if RAG is basically a Retriever (DPR) + Generator (BART). Shouldn't I be able to just put the generator since the retriever is already defined?

julian-risch commented 3 years ago

Hi @gianlucabusatta the warning and the error message that you report are raised by the underlying transformers library when RAGenerator tries to load "facebook/bart-base" as a RAG model: https://github.com/huggingface/transformers/blob/76c4d8bf26de3e4ab23b8afeed68479c2bbd9cbd/src/transformers/models/rag/configuration_rag.py#L132 As the error message states, the model cannot be loaded because it does not provide a question_encoder (among other things). For "facebook/bart-base", there is actually a different class called MBartForConditionalGeneration in transformers.

The implementation in haystack is fixed to RAG models and the model "facebook/rag-token-nq" used in the tutorial indicates RagTokenForGeneration architecture in its config file. In contrast to that, "facebook/bart-base" indicates a BartModel architecture in its config file.

julian-risch commented 3 years ago

Transformers' RagTokenForGeneration can be initialized with a retriever or a separate retriever can be used which forwards retrieved documents to the RAG model. Here is the documentation page: https://huggingface.co/transformers/model_doc/rag.html The generator within transformers' RAG model could be set to a BartForConditionalGeneration model, which is what you would like to do, I think: https://huggingface.co/transformers/model_doc/rag.html#ragmodel Maybe @lalitpagaria can explain it better?

julian-risch commented 3 years ago

If you would like to replace the generator, keep in mind that the retriever and generator should be fine-tuned jointly to get good results.

gianlucabusatta commented 3 years ago

Thank you for the answers @julian-risch. I'm trying to follow your suggestions.

from transformers import RagTokenForGeneration, AutoTokenizer
model = RagTokenForGeneration.from_pretrained_question_encoder_generator("voidful/dpr-question_encoder-bert-base-multilingual", "facebook/mbart-large-cc25")
model.save_pretrained('/content/RagTokenForGeneration/')

question_encoder_tokenizer = AutoTokenizer.from_pretrained("voidful/dpr-question_encoder-bert-base-multilingual")
generator_tokenizer = AutoTokenizer.from_pretrained("facebook/mbart-large-cc25")
question_encoder_tokenizer.save_pretrained('/content/RagTokenForGeneration/question_encoder_tokenizer')
generator_tokenizer.save_pretrained('/content/RagTokenForGeneration/generator_tokenizer')

Here I'm saving the model and tokenizers in the "/content/RagTokenForGeneration/" directory following the same structure as in huggingface rag-token like this

generator = RAGenerator(
    model_name_or_path="/content/RagTokenForGeneration/",
    use_gpu=True,
    top_k=1,
    max_length=200,
    min_length=2,
    embed_title=True,
    num_beams=2
)

From now on the rest is equal to the tutorial.

My doubs are:

Does it make sense to do something like this?
How fine-tune retriever and generator jointly?

julian-risch commented 3 years ago

Honestly, I am not so sure whether that plan will work. For fine-tuning, there is an example here: https://github.com/huggingface/transformers/tree/master/examples/research_projects/rag The idea of a multilingual RAG is definitely interesting but it's also a quite advanced topic. Unfortunately, I think that the available, pre-trained models are not suitable for this task.

gianlucabusatta commented 3 years ago

With the above setting (the rest is equal to the tutorial), when generating the answer

# Now generate an answer for each question
for question in QUESTIONS:
    # Retrieve related documents from retriever
    retriever_results = retriever.retrieve(
        query=question
    )

    # Now generate answer from question and retrieved documents
    predicted_result = generator.predict(
        query=question,
        documents=retriever_results,
        top_k=1
    )

    # Print you answer
    answers = predicted_result["answers"]
    print(f'Generated answer is \'{answers[0]["answer"]}\' for the question = \'{question}\'')

I got this error:

OverflowError                             Traceback (most recent call last)
<ipython-input-10-9538ef81670e> in <module>()
     10         query=question,
     11         documents=retriever_results,
---> 12         top_k=1
     13     )
     14 

6 frames
/usr/local/lib/python3.7/dist-packages/haystack/generator/transformers.py in predict(self, query, documents, top_k)
    242         input_dict = self.tokenizer.prepare_seq2seq_batch(
    243             src_texts=[query],
--> 244             return_tensors="pt"
    245         )
    246         input_ids = input_dict['input_ids'].to(self.device)

/usr/local/lib/python3.7/dist-packages/transformers/models/rag/tokenization_rag.py in prepare_seq2seq_batch(self, src_texts, tgt_texts, max_length, max_target_length, padding, return_tensors, truncation, **kwargs)
    106             padding=padding,
    107             truncation=truncation,
--> 108             **kwargs,
    109         )
    110         if tgt_texts is None:

/usr/local/lib/python3.7/dist-packages/transformers/models/rag/tokenization_rag.py in __call__(self, *args, **kwargs)
     61 
     62     def __call__(self, *args, **kwargs):
---> 63         return self.current_tokenizer(*args, **kwargs)
     64 
     65     def batch_decode(self, *args, **kwargs):

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in __call__(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
   2303                 return_length=return_length,
   2304                 verbose=verbose,
-> 2305                 **kwargs,
   2306             )
   2307         else:

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in batch_encode_plus(self, batch_text_or_text_pairs, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
   2488             return_length=return_length,
   2489             verbose=verbose,
-> 2490             **kwargs,
   2491         )
   2492 

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_fast.py in _batch_encode_plus(self, batch_text_or_text_pairs, add_special_tokens, padding_strategy, truncation_strategy, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose)
    380             max_length=max_length,
    381             stride=stride,
--> 382             pad_to_multiple_of=pad_to_multiple_of,
    383         )
    384 

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_fast.py in set_truncation_and_padding(self, padding_strategy, truncation_strategy, max_length, stride, pad_to_multiple_of)
    333         # Set truncation and padding on the backend tokenizer
    334         if truncation_strategy != TruncationStrategy.DO_NOT_TRUNCATE:
--> 335             self._tokenizer.enable_truncation(max_length, stride=stride, strategy=truncation_strategy.value)
    336         else:
    337             self._tokenizer.no_truncation()

OverflowError: int too big to convert

Any clue on how to solve this?

julian-risch commented 3 years ago

That might be a problem related to the loaded tokenizer_config.json file and the value of max_length in line 335 of transformers/tokenization_utils_fast.py . I am guessing that the tokenizer.model_max_length is not set, which causes the OverflowError: int too big to convert. Sorry that I cannot provide more help than that.

gianlucabusatta commented 3 years ago

Thank you for the help!

lalitpagaria commented 3 years ago

@gianlucabusatta I would suggest to try this on transformers code itself, at least it will have less variables. Once you managed to make it work on transformers., then I don't think it will be difficult to integrate it with Haystack. RAG implementation in transformers is not standard, it have few tweaks hence it is bit tricky.

deepset-ai / haystack

Non-RAG model as Generator for "Generative Q&A with RAG" #1409