-
According to the source code below, I think there are five model types of transformers below.
https://github.com/tensorflow/mesh/blob/master/mesh_tensorflow/transformer/utils.py
- bitransformer
…
-
I used the instructions mentioned [here](https://github.com/google-research/multilingual-t5) for pre-training mT5.
But it is throwing this error
`--eval_gin_param=mesh_eval_dataset_fn.num_eval_…
-
I could see the checkpoint file size varies within the mT5-base. Like wise, it is also observed for mT5-large models.
Is there any specific reason for this ?
PS: I am not referring the checkpoint…
-
```
!t5_mesh_transformer \
--model_dir="gs://t5-data/pretrained_models/mt5/base" \
--use_model_api \
--mode="export_predict" \
--export_dir="{saved_model_dir}"
saved_model_path = os.pa…
-
# ❓ Questions & Help
## Details
Hello again @ykim362,
I'm trying to reproduce your distillation results from Section 2 of the FastFormers paper and I have a few questions I was hoping you c…
-
The smallest mT5 model has 300M parameters, and that dies in Travis with:
```
RuntimeError: [enforce fail at CPUAllocator.cpp:65] . DefaultCPUAllocator: can't allocate memory: you tried to allocate …
-
## Environment info
- `transformers` version: 4.0.0 --rc1
- Platform: Colab
- Python version: 3.6.9
- PyTorch version (GPU?): TESLA V4
- Tensorflow version (GPU?): 2.3.0
- Using GPU in…
-
Hi,
I've been working with the mT5 models recently, and wanted to get a better understanding of the dataset input function during training.
I've looked closely at mesh_tf's data_fn but wasn't ab…
-
# 🌟 New model addition
## Model description
T5 version t5.1.1.* is very similar to the original T5 model, with the following differences:
- GEGLU activation in feed-forward hidden layer, rather…
-
## Environment info
- `transformers` version:
- Platform: Ubuntu 18.04
- Python version: 1.7.0
- PyTorch version (GPU?): 1.7.0
- Tensorflow version (GPU?):
- Using GPU in script?: No
-…