allenai / unifiedqa

UnifiedQA: Crossing Format Boundaries With a Single QA System
https://arxiv.org/abs/2005.00700
Apache License 2.0
428 stars 43 forks source link

Not exactly reproducing the example with huggingface #11

Closed wenlongzhao094 closed 3 years ago

wenlongzhao094 commented 3 years ago

I followed the example in README to load UnifiedQA with huggingface, but there seems to be some version or architectural problems (Code attached at the end). (1) Some weights of the checkpoint were not used when initializing T5ForConditionalGeneration. It seems the architecture of T5ForConditionalGeneration is not exactly the same as the original t5 in tensorflow; what are the differences? Is there an alternative in huggingface that is exactly the same as the seq2seq tensorflow t5? (2) The output contains special tokens. I can solve this by setting "skip_special_tokens=True" during "decode". But is this normal? Is it a version problem?

Thank you!

>>> from transformers import AutoTokenizer, T5ForConditionalGeneration
>>> model_name = "allenai/unifiedqa-t5-small"
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
>>> model = T5ForConditionalGeneration.from_pretrained(model_name)

Some weights of the model checkpoint at allenai/unifiedqa-t5-small were not used when initializing T5ForConditionalGeneration: ['decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight']
- This IS expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
>>>
>>> def run_model(input_string, **generator_args):
...     input_ids = tokenizer.encode(input_string, return_tensors="pt")
...     res = model.generate(input_ids, **generator_args)
...     return [tokenizer.decode(x) for x in res]
...
>>> run_model("which is best conductor? \\n (a) iron (b) feather")

['<pad> iron</s>']
danyaljj commented 3 years ago

What are your HF andtorch versions?

wenlongzhao094 commented 3 years ago

huggingface transformers 4.1.1 torch 1.7.1

danyaljj commented 3 years ago

The issue is quite odd. Could you try transformers 4.0 or 3.9?

wenlongzhao094 commented 3 years ago

Same issue with transformers 4.0.0. There does not seem to be a 3.9.x version...

wenlongzhao094 commented 3 years ago

It seems that the weights not being used during initialization is actually proper: https://github.com/huggingface/transformers/pull/8518 The weights are removed from huggingface t5 after 3.5.0. I assume the README example was initally ran with <=3.5.0?

danyaljj commented 3 years ago

Sorry for the delay!

I tried different HF versions:

So there must be recent changes on HF's side that are messing up the predictions. I will report the issue.

danyaljj commented 3 years ago

According to the discussion on HF, the new versions expected few arguments that were missing on the example. I have updated the readme example accordingly.