-
Hello , I want to do distributed training using four machines ,each one has 8 1080ti GPUs on En-Zh translation task, and the t2t-version is 1.6.5. I have seen the other similar issues , and
the di…
-
I ran the setup instructions on a preixisting GCP machine with cuda 10.1 and one modification:
```bash
mv ckpt/pegasus_ckpt ckpt2
```
(Instructions don't work as written because they don't acknowl…
-
### Description
I am trying to follow the approach as mentioned in paper: block parallel decoding for deep autoregressive models. It states that firstly the model is trained on transformer model for …
-
Sentences longer than the parameter `max_length` are excluded from training and lowering this parameter helps to prevent [OOM errors](#581) and allows to use [higher `batch_size`](https://github.com/t…
-
### Description
why not use all gpu memory - t2t training?
nvidia-smi reports low GPU
![image](https://user-images.githubusercontent.com/32744746/64552801-fc0db500-d372-11e9-8864-5104b5c16b59.p…
-
I'd like to train a transformer_ae model using translate_ende_wmt32k problem.
Parts of my commands is copied as below.
```
PROBLEM=translate_ende_wmt32k
MODEL=transformer_ae
HPARAMS=transformer…
-
There has been a lot of advancements recently in achieving context for dialog models through a separate context layer. Eg. [HRAN](https://arxiv.org/pdf/1701.07149.pdf) or [VHRED](http://www.cs.toronto…
-
I'm relatively new to t2t and was studying leveraging it for ASR when I came across your work.
Amazing work done @mohitsshah with proper explanation over at16k. The results are pretty impressive.
I'…
-
Hi,
the link to Meta-World in the README is broken. Do you have another reference to the version you used? Or will I be able to reproduce your results by cloning Meta-World from the master branch o…
-
Hello,
Thank you for your work. I am interested in your AdaFactor implementation. I want to use the same training hyper-parameters from PEGASUS (https://arxiv.org/pdf/1912.08777.pdf) to train my mode…