-
Hello I am trying the `translate_ende_wmt32k` problem given in the walk-through. Trained the model and during inference I don't want to decode file on a whole but want to decode sentence by sentence a…
-
Do you mind to provide the code or details about how to implement the visualization of the attention map?
HqWei updated
5 years ago
-
### Description
I know t2t has a hyperparameter tuning functin, but it's only for ML Engine.
I implented hyperparameter tuning with Optuna for t2t v1.10.0.
https://github.com/Drunkar/tensor2ten…
-
Working with serving multiple beam search outputs, and bumped into an issue. I've actually solved the issue, but need to figure out how to submit the change without breaking for other people.
The c…
-
### Description
Hi, I am having issues simply installing and running tensor2tensor. Was wondering if anyone could help me pinpoint the source of the issue.
### Environment information
```
OS:…
-
### Description
We tried running language modeling with languagemodel_ptb10k and the transformer_small as recommended in the README. No errors / tensorboard training curves looked fine, but the dec…
-
How to disable the shuffle dataset when training the problem of the summarization?
When i run the t2t-trainer using the recommended hparams for the cnn_daily problem,the trainer always shuffle the …
-
*feature request*
### Description
I would like to suggest to prominently add support for sparse input tensors - specifically for mixture-of-expert gating functionality (e.g. local_moe and noisy_top_…
-
### Description
Bytenet and slicenet are not giving the required performance and results. I have even tried lowering down the learning rate and even used the same as in the [paper](https://arxiv.org…
-
### Description
We are trying to transform a regular FP32 transformer-big model into a much smaller FP16 one. We successfully down-scaled all values so that now our model on disk is (roughly) half …