Closed danyaljj closed 3 years ago
I am guessing the issue is that it is not automatically loading your latest trained checkpoint. It tries to automatically load a checkpoint on initialization:
https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/models/hf_model.py#L200
using load_latest_checkpoint
:
https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/models/hf_model.py#L270
But the checkpoint format I implemented for the HF model is probably different from whatever format is living in your "/home/danielk/small_standard/pytorch_model/"
directory. Do you want to take a look at the checkpoint saving/loading logic and the naming conventions etc. and confirm that it doesn't match the checkpoint format you're using? If not, I would be open to changing the checkpoint convention so that it matches something more standard. I just made up a format that I thought was reasonable. Maybe there is a more standard format/naming convention we could use?
I believe he has used this checkpoint here: https://huggingface.co/t5-small this is also my issue, it would be really helpful if you could assist in solving this issue. thanks.
I am not sure how to use load_latest_checkpoint, could you please tell me how I can add this? thanks
I looked at load_checkpoint and save_checkpoint, this seems to be normal pytorch code, and the error to me is not due to other checkpoint format, could you have a closer look please? thanks
Hi Julia, If you are using the released checkpoint in huggingface repo, it is because the model is not trained on this task and the results are expected. I changed the evaluation to a task that T5 is trained on to evaluate how well the results match the other implementation, here are the results of two models, and they match well:
1) HuggingFace T5 model
from transformers import T5Config, T5Tokenizer, T5ForConditionalGeneration
model = T5ForConditionalGeneration.from_pretrained("t5-small")
tokenizer = T5Tokenizer.from_pretrained("t5-small")
model.eval()
def run_model(input_string, **generator_args):
input_ids = tokenizer.encode(input_string, return_tensors="pt")
res = model.generate(input_ids, **generator_args)
tokens = [tokenizer.decode(x) for x in res]
print(tokens)
run_model("translate English to German: how many states does the US has? ")
run_model("translate English to German: who is the US president?")
run_model("translate English to German: who got the first nobel prize in physics?")
run_model("translate English to German: when is the next deadpool movie being released?")
run_model("translate English to German: which mode is used for short wave broadcast service?")
run_model("translate English to German: the south west wind blows across nigeria between?")
Results:
['Wie viele Staaten haben die USA?']
['Wer ist der US-Präsident?']
['wer hat den ersten Nobelpreis in der Physik erhalten?']
['wann wird der nächste Deadpool-Film veröffentlicht?']
['Welchen Modus wird fĂĽr Kurzwellenstrahlung verwendet?']
['der Südwestwind bläst durchnigeria zwischen?']
2) HF T5 model
import functools
import t5
import torch
import transformers
if torch.cuda.is_available():
device = torch.device("cuda")
else:
device = torch.device("cpu")
model = t5.models.HfPyTorchModel("t5-small", "/tmp/hft5/", device)
# Generate some predictions
inputs = [
"translate English to German: how many states does the US has? ",
"translate English to German: who is the US president?",
"translate English to German: who got the first nobel prize in physics?",
"translate English to German: when is the next deadpool movie being released?",
"translate English to German: which mode is used for short wave broadcast service?",
"translate English to German: the south west wind blows across nigeria between?",
]
model.predict(
inputs,
sequence_length={"inputs": 32},
batch_size=2,
)
Results:
/usr/local/lib/python3.6/dist-packages/t5/models/hf_model.py:549: UserWarning: Creating resources inside a function passed to Dataset.map() is not supported. Create each resource outside the function, and capture it inside the function to use it.
num_parallel_calls=tf.data.experimental.AUTOTUNE,
INFO:absl:translate English to German: how many states does the US has?
-> Wie viele Staaten haben die USA?
INFO:absl:translate English to German: who is the US president?
-> Wer ist der US-Präsident?
INFO:absl:translate English to German: who got the first nobel prize in physics?
-> Wer hat den ersten Nobelpreis in der Physik erhalten?
INFO:absl:translate English to German: when is the next deadpool movie being released?
-> ob der nächste Deadpool-Film veröffentlicht wird?
INFO:absl:translate English to German: which mode is used for short wave broadcast service?
-> Welchen Modus wird fĂĽr Kurzwellenstrahlung verwendet?
INFO:absl:translate English to German: the south west wind blows across nigeria between?
-> Der Südwestwind bläst zwischennigeria?
I tried to finetune the released pytorch code on WMT, the blue score I am getting was around 1 after 50000 steps, I am pretty sure there is a bug in the data processing pipeline, and that this does not match Huggingface model decoding.
We verified that the results for translation are roughly the same in the past. https://github.com/huggingface/transformers/issues/5543
Hi Colin Is this with pytorch version? In the discussion it seems it is with tensorflow version. thanks
It's comparing the Mesh Tensorflow version to the Hugging Face PyTorch version.
Hi sorry I think there is misunderstanding, I evaluated the HF pytorch version so the model which wraps the huggingface model. thanks.
Yes, we have verified that that model gives the same outputs as the mesh tensorflow version.
Hi, thanks for the response, I still think the discussion in huggingface/transformers#5543 is comparing mesh transformer version with huggingface model, but I evaluated this model: https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/models/hf_model.py thanks
The model in https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/models/hf_model.py is the hugging face pytorch model. The code in that file just calls out to the hugging face library.
Hi The encoding/decoding part of this model in data pipeline processing does not match the huggingface one and I am thinking this might be the cause of the difference.
On Fri, Oct 30, 2020 at 8:46 PM Colin Raffel notifications@github.com wrote:
The model in https://github.com/google-research/text-to-text-transfer-transformer/blob/master/t5/models/hf_model.py is the hugging face pytorch model. The code in that file just calls out to the hugging face library.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google-research/text-to-text-transfer-transformer/issues/463#issuecomment-719760722, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARPXHHZBFSQSOEU2T5UNQELSNMJYLANCNFSM4S2KOIIQ .
Can you be specific about any differences you have found? The encoding and decoding both use the same sentencepiece model.
Hi I am not sure which part is exactly different causing it, this require a deeper look into codes for debugging. I just tried to run your HF model on WMT dataset from scratch.
On Fri, Oct 30, 2020 at 9:46 PM Colin Raffel notifications@github.com wrote:
Can you be specific about any differences you have found? The encoding and decoding both use the same sentencepiece model.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google-research/text-to-text-transfer-transformer/issues/463#issuecomment-719789371, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARPXHH442M3POJVGK6YXUG3SNMQZPANCNFSM4S2KOIIQ .
Hi I investigated the codes more, as I guessed, the way you encode inputs /decode the final outputs does not match the huggingface model, resulting in poor performance, below, please find how I corrected the predict function in your HF model:
dataset_len = len(inputs) dataset = tf.data.Dataset.from_tensor_slices(inputs) import numpy as np from transformers import T5Tokenizer path="/home/rabeeh/pl/data/t5-small" max_length=sequence_length["inputs"] tokenizer = T5Tokenizer.from_pretrained(path) dataset = tfds.as_numpy(dataset) def data_collator(batch): batch = np.stack([x.decode("utf-8") for x in batch]) input_encodings = tokenizer.batch_encode_plus(batch, pad_to_max_length=True,
max_length=max_length, return_tensors="pt") return input_encodings class IterableDataset(torch.utils.data.IterableDataset): def init(self, iterable): super(IterableDataset).init() self.iterable = iterable def iter(self): return self.iterable dataset = IterableDataset(iter(dataset)) num_batches = int(dataset_len / batch_size) loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, collate_fn=datacollator) for , batch in enumerate(itertools.islice(loader, num_batches)): predicted_tokens = self._model.generate(batch['input_ids'].cuda(), **generate_kwargs) predictions = [tokenizer.decode(ids) for ids in predicted_tokens] print(predictions)
On Fri, Oct 30, 2020 at 10:29 PM Rabeeh Karimi Mahabadi rabeeh@google.com wrote:
Hi I am not sure which part is exactly different causing it, this require a deeper look into codes for debugging. I just tried to run your HF model on WMT dataset from scratch.
On Fri, Oct 30, 2020 at 9:46 PM Colin Raffel notifications@github.com wrote:
Can you be specific about any differences you have found? The encoding and decoding both use the same sentencepiece model.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google-research/text-to-text-transfer-transformer/issues/463#issuecomment-719789371, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARPXHH442M3POJVGK6YXUG3SNMQZPANCNFSM4S2KOIIQ .
Hey there! đź‘‹
TLDR; I have this t5-small model that is fine-tuned on natural-questions. For this model, I get its predictions once using
hf_model.py
and another time using HF code. The outputs are different (and the outputs using HF seem to be more reasonable).This is a thread on using
hf_model.py
; I know that this code is not a well-tested code. Sharing these observations here in case they help you improve this model.and here is the output:
path = "/home/danielk/small_standard/pytorch_model" model = T5ForConditionalGeneration.from_pretrained(path) tokenizer = T5Tokenizer.from_pretrained(path) model.eval()
def run_model(input_string, generator_args): input_ids = tokenizer.encode(input_string, return_tensors="pt") res = model.generate(input_ids, generator_args) tokens = [tokenizer.decode(x) for x in res] print(tokens)
run_model("how many states does the US has? ") run_model("who is the US president?") run_model("who got the first nobel prize in physics?") run_model("when is the next deadpool movie being released?") run_model("which mode is used for short wave broadcast service?") run_model("the south west wind blows across nigeria between?")
2020-10-21 21:14:44.634221: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory 2020-10-21 21:14:44.634259: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. ['50'] ['Donald Trump'] ['Wilhelm Conrad Röntgen'] ['December 18, 2018'] ['TCP port 25'] ['the Nigerian and Pacific Oceans']