Closed gianlucabusatta closed 3 years ago
Hi @gianlucabusatta the warning and the error message that you report are raised by the underlying transformers library when RAGenerator
tries to load "facebook/bart-base"
as a RAG model: https://github.com/huggingface/transformers/blob/76c4d8bf26de3e4ab23b8afeed68479c2bbd9cbd/src/transformers/models/rag/configuration_rag.py#L132
As the error message states, the model cannot be loaded because it does not provide a question_encoder (among other things).
For "facebook/bart-base"
, there is actually a different class called MBartForConditionalGeneration
in transformers.
The implementation in haystack is fixed to RAG models and the model "facebook/rag-token-nq"
used in the tutorial indicates RagTokenForGeneration
architecture in its config file. In contrast to that, "facebook/bart-base"
indicates a BartModel
architecture in its config file.
Transformers' RagTokenForGeneration can be initialized with a retriever or a separate retriever can be used which forwards retrieved documents to the RAG model. Here is the documentation page: https://huggingface.co/transformers/model_doc/rag.html The generator within transformers' RAG model could be set to a BartForConditionalGeneration model, which is what you would like to do, I think: https://huggingface.co/transformers/model_doc/rag.html#ragmodel Maybe @lalitpagaria can explain it better?
If you would like to replace the generator, keep in mind that the retriever and generator should be fine-tuned jointly to get good results.
Thank you for the answers @julian-risch. I'm trying to follow your suggestions.
from transformers import RagTokenForGeneration, AutoTokenizer
model = RagTokenForGeneration.from_pretrained_question_encoder_generator("voidful/dpr-question_encoder-bert-base-multilingual", "facebook/mbart-large-cc25")
model.save_pretrained('/content/RagTokenForGeneration/')
question_encoder_tokenizer = AutoTokenizer.from_pretrained("voidful/dpr-question_encoder-bert-base-multilingual")
generator_tokenizer = AutoTokenizer.from_pretrained("facebook/mbart-large-cc25")
question_encoder_tokenizer.save_pretrained('/content/RagTokenForGeneration/question_encoder_tokenizer')
generator_tokenizer.save_pretrained('/content/RagTokenForGeneration/generator_tokenizer')
Here I'm saving the model and tokenizers in the "/content/RagTokenForGeneration/" directory following the same structure as in huggingface rag-token like this
generator = RAGenerator(
model_name_or_path="/content/RagTokenForGeneration/",
use_gpu=True,
top_k=1,
max_length=200,
min_length=2,
embed_title=True,
num_beams=2
)
From now on the rest is equal to the tutorial.
My doubs are:
Honestly, I am not so sure whether that plan will work. For fine-tuning, there is an example here: https://github.com/huggingface/transformers/tree/master/examples/research_projects/rag The idea of a multilingual RAG is definitely interesting but it's also a quite advanced topic. Unfortunately, I think that the available, pre-trained models are not suitable for this task.
With the above setting (the rest is equal to the tutorial), when generating the answer
# Now generate an answer for each question
for question in QUESTIONS:
# Retrieve related documents from retriever
retriever_results = retriever.retrieve(
query=question
)
# Now generate answer from question and retrieved documents
predicted_result = generator.predict(
query=question,
documents=retriever_results,
top_k=1
)
# Print you answer
answers = predicted_result["answers"]
print(f'Generated answer is \'{answers[0]["answer"]}\' for the question = \'{question}\'')
I got this error:
OverflowError Traceback (most recent call last)
<ipython-input-10-9538ef81670e> in <module>()
10 query=question,
11 documents=retriever_results,
---> 12 top_k=1
13 )
14
6 frames
/usr/local/lib/python3.7/dist-packages/haystack/generator/transformers.py in predict(self, query, documents, top_k)
242 input_dict = self.tokenizer.prepare_seq2seq_batch(
243 src_texts=[query],
--> 244 return_tensors="pt"
245 )
246 input_ids = input_dict['input_ids'].to(self.device)
/usr/local/lib/python3.7/dist-packages/transformers/models/rag/tokenization_rag.py in prepare_seq2seq_batch(self, src_texts, tgt_texts, max_length, max_target_length, padding, return_tensors, truncation, **kwargs)
106 padding=padding,
107 truncation=truncation,
--> 108 **kwargs,
109 )
110 if tgt_texts is None:
/usr/local/lib/python3.7/dist-packages/transformers/models/rag/tokenization_rag.py in __call__(self, *args, **kwargs)
61
62 def __call__(self, *args, **kwargs):
---> 63 return self.current_tokenizer(*args, **kwargs)
64
65 def batch_decode(self, *args, **kwargs):
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in __call__(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
2303 return_length=return_length,
2304 verbose=verbose,
-> 2305 **kwargs,
2306 )
2307 else:
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in batch_encode_plus(self, batch_text_or_text_pairs, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
2488 return_length=return_length,
2489 verbose=verbose,
-> 2490 **kwargs,
2491 )
2492
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_fast.py in _batch_encode_plus(self, batch_text_or_text_pairs, add_special_tokens, padding_strategy, truncation_strategy, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose)
380 max_length=max_length,
381 stride=stride,
--> 382 pad_to_multiple_of=pad_to_multiple_of,
383 )
384
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_fast.py in set_truncation_and_padding(self, padding_strategy, truncation_strategy, max_length, stride, pad_to_multiple_of)
333 # Set truncation and padding on the backend tokenizer
334 if truncation_strategy != TruncationStrategy.DO_NOT_TRUNCATE:
--> 335 self._tokenizer.enable_truncation(max_length, stride=stride, strategy=truncation_strategy.value)
336 else:
337 self._tokenizer.no_truncation()
OverflowError: int too big to convert
Any clue on how to solve this?
That might be a problem related to the loaded tokenizer_config.json
file and the value of max_length
in line 335 of transformers/tokenization_utils_fast.py . I am guessing that the tokenizer.model_max_length
is not set, which causes the OverflowError: int too big to convert
. Sorry that I cannot provide more help than that.
Thank you for the help!
@gianlucabusatta I would suggest to try this on transformers code itself, at least it will have less variables. Once you managed to make it work on transformers., then I don't think it will be difficult to integrate it with Haystack. RAG implementation in transformers is not standard, it have few tweaks hence it is bit tricky.
About the tutorial "Generative QA with "Retrieval-Augmented Generation", is there a way to use a non-RAG model as generator?
E.g. I would like to use a custom mBART.
For instance, using "facebook/bart-base" in the generator
raises an error:
You are using a model of type bart to instantiate a model of type rag. This is not supported for all configurations of models and can yield errors.
AssertionError: Config has to be initialized with question_encoder and generator config
.I don't understand why do I need to put a RAG model in here if RAG is basically a Retriever (DPR) + Generator (BART). Shouldn't I be able to just put the generator since the retriever is already defined?