-
### Description
I have trained a simple NMT dnn using the transformer model on a small dataset and I am pretty impressed by the good result achieved with just 4500 steps. Now the problem arises when …
-
I would like to use the transformer architecture for a sequence-labeling problem. I have two files, one consisting of the input tokens, and the other one of the labels. The labels are short strings an…
-
Hi,
I would like to create a new model that uses the transformer encoder and decoder. However, it also needs to include layers in between and the decoder needs to use not only the output of the tr…
-
**Describe the bug**
My model makes use of `torch.nn.fold` quite a bit, when I try to use `add_graph` on my model I get the following exception:
```
in merge(self, x)
39
40 …
-
### Description
I am training a `Transformer` model on the `Librispeech` dataset using 4 GPUs with 8 CPU-cores.
I have tested the following:
#### Single-GPU
```bash
export CUDA_VISIBLE_D…
-
### Description
...
### Environment information
```
OS:
Version: tf-cpu.1-14.m34
Based on: Debian GNU/Linux 9.9 (stretch) (GNU/Linux 4.9.0-9-amd64 x86_64\n)
Linux cpu1-vm 4.9.0-9-amd64 #…
-
### Description
I tried training a LM with both languagemodel_ptb10k and languagemodel_lm1b32k as target problems, both succeded without problems. The decoding part also seemed fine but the output re…
-
### Description
after language model I have already trained, decode from file, I will always get this output:
"pad>......"
### Environment information
```
OS: centos 7.3
$ pip freeze | grep …
-
### Description
I'm trying to reproduce the En-De experiment in the paper "Attention is all you need".
While, I'm confused by training data. The paper used the WMT14 training data, while the follow…
-
Hi.
I am trying to use TPUv2-8 to train a query classifier. However, I got some issues here about memory.
Officially, it claims that TPUv2-8 has 64 GB memory. However, I kept getting this error w…