allenai / unifiedqa

UnifiedQA: Crossing Format Boundaries With a Single QA System
https://arxiv.org/abs/2005.00700
Apache License 2.0
428 stars 43 forks source link

Huggingface 3B and 11B models not configured properly #8

Closed Shamdan17 closed 3 years ago

Shamdan17 commented 4 years ago

Hello, it seems that the models are not properly configured on huggingface so it is not possible to download and use them using the given snippets in the readme. If you try to do so using the code snippet in the readme:

from transformers import AutoTokenizer, T5ForConditionalGeneration

model_name = "allenai/unifiedqa-t5-3b" # you can specify the model size here
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

The following error occurs:

OSError                                   Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    927                 if resolved_archive_file is None:
--> 928                     raise EnvironmentError
    929             except EnvironmentError:

OSError: 

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
1 frames
/usr/local/lib/python3.6/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    933                     f"- or '{pretrained_model_name_or_path}' is the correct path to a directory containing a file named one of {WEIGHTS_NAME}, {TF2_WEIGHTS_NAME}, {TF_WEIGHTS_NAME}.\n\n"
    934                 )
--> 935                 raise EnvironmentError(msg)
    936 
    937             if resolved_archive_file == archive_file:

OSError: Can't load weights for 'allenai/unifiedqa-t5-3b'. Make sure that:

- 'allenai/unifiedqa-t5-3b' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'allenai/unifiedqa-t5-3b' is the correct path to a directory containing a file named one of pytorch_model.bin, tf_model.h5, model.ckpt.

On huggingface, both the 3B model and the 11B do not seem to have the weights file when you list the model files, which is probably the cause of the issue. Is this a mistake or is it on purpose? Because the original T5-11B model has all the weight files on huggingface as expected.

PS: The large model example in the readme also seems to be mistyped, using allenai/unifiedqa-t5-large instead of allenai/unifiedqa-large

Thanks!

danyaljj commented 4 years ago

Merhaba, you're right @Shamdan17. The main blocker is this issue https://github.com/huggingface/transformers/issues/8160. I should have the complete models for 11b and 3b in a couple of days.

danyaljj commented 3 years ago

The issue should be resolved now, as per this conversation: https://github.com/huggingface/transformers/issues/8480 Let me know if you see any issue @Shamdan17.

iMayK commented 3 years ago

Still facing the issue with 11B model!

danyaljj commented 3 years ago

@iMayK Could you include a screenshot of your error?

PeterAJansen commented 3 years ago

@danyaljj I just tried this with a fresh HF Transformers pull and it does appear to be working for 3B but not 11B. Here's the 11B error:

image

danyaljj commented 3 years ago

That's quite odd! I did check the HF models and it looks like the models are already there: https://huggingface.co/allenai/unifiedqa-t5-11b/tree/main

I'll take a closer look to see what's going on.

danyaljj commented 3 years ago

Update: I am running the readme example with the 11B model:

from transformers import AutoTokenizer, T5ForConditionalGeneration

model_name = "allenai/unifiedqa-t5-11b" # you can specify the model size here
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

def run_model(input_string, **generator_args):
    input_ids = tokenizer.encode(input_string, return_tensors="pt")
    res = model.generate(input_ids, **generator_args)
    return tokenizer.batch_decode(res, skip_special_tokens=True)

run_model("which is best conductor? \\n (a) iron (b) feather")

It's downloading the model now:

>>> model = T5ForConditionalGeneration.from_pretrained(model_name)
Downloading:  27%|████████████████████████████████▊  

So at least I know that it can successfully access the model. Will update the thread if it fails/succeeds.

danyaljj commented 3 years ago

Update: the evaluation and it successfully printed an answer: Screen Shot 2021-01-19 at 10 17 57 PM

So basically it works for me. Now we have to figure out why it is not working for you.

For completeness, here is my environment:

> pip list | grep transformers 
transformers                       4.2.1 
> pip list | grep tokenizers 
tokenizers                         0.9.4

@PeterAJansen do your versions match the above ones?

One thing I would note is that, since the model is huge (~40GB), a slow connection (or one with disruptions) could potentially result in corrupt models.

PeterAJansen commented 3 years ago

Interesting... I wonder why some folks are seeing it work in some cases, and not others. Here are some cases:

1) Transformers 3.5.0, unifiedqa example, works: image

2) Fresh clone of transformers (~4.2.1), finetune seq2seq example, works for everything but 11b:

Base: image

Large: image

11B: It actually gets through all the files except for the weights, and throws an error rather than trying to download it: image

I'll keep tinkering and see if I can figure out what the difference in these cases is. It is strange that the only difference in the cases in (2) is the model name specified on the command line... it should work.

PeterAJansen commented 3 years ago

Aha -- I'm finally able to replicate (and solve) it.

Case 1 (Works): Using the official transformers 4.2.1 release: peter@neutronium:~/github/transformers-t5-a100$ pip install transformers==4.2.1 image

Case 2 (doesn't work): Using a very recent but not this-minute transformers clone (I think from the last 1-2 days): peter@neutronium:~/github/transformers-t5-a100$ pip install . image

Case 3 (works): Using a completely fresh clone made minutes ago (where things in seq2seq seem to have significantly changed): peter@neutronium:~/github/transformers/examples/seq2seq$ python finetune_trainer.py --data_dir $XSUM_DIR --output_dir=xsum_results --num_train_epochs 1 --model_name_or_path allenai/unifiedqa-t5-11b (this one produces a lot of output, but also starts downloading the model successfully).

In summary: I have no idea what's wonky about the pull I've been using from the last few days, but there seem to have been significant changes today, and it now fetches 11B successfully too.

Only in transformers can the library you're using change significantly over hours... thanks for your help!