-
is there a lenght limit for data provided direclty in the inputData.txt file? Is this data analyzed as a 2d map or linear?
Thank you
-
Hi there,
I am trying to define the activation of the last layer as the exponential function. If x is the input vector to a node in the last layer, the output of the node would be exp(w*x + b). Is …
-
when I test my own dataset ,it doesn't work, the resolution is 1024*1024, does the resolution must be 320*240 and 640*480?
-
## 🚀 Feature
Improved Transformer and MultiHeadAttention design
## Motivation
Current Transformer and MultiHeadAttention design is not a very `PyTorch`-ish. The `in_proj` is not even a `nn.Mo…
-
0. Related work search on the supervised prediction: Look at all papers that cite Reddy/Redyy++/Fahramand. http://www.aclweb.org/anthology/I11-1024 http://www.aclweb.org/anthology/P16-2026 http://www.…
-
### 🐛 Describe the bug
TransformerEncoder runs much slower with src_key_padding_mask than without any padding. On v100, it takes ~8.8ms for bert-base batch size 1 seq 128 with mask set while only tak…
-
Paraphrasing @tterribe:
For a given frame, we can have up to 8 sets of CDEF parameters for superblocks to choose from (note: it is better to have fewer than 8 for lower bitrates, to reduce the cost…
-
### 🐛 Describe the bug
Similar to [this issue](https://github.com/pytorch/pytorch/issues/107084) (which is for `nn.MultiheadAttention`) and [this comment](https://github.com/pytorch/pytorch/issues/10…
-
Marvellous work! The result is real a magic.
However, one may spend several minutes using this model to generate an output. I think the reason is that this model is based on online image optimizat…
-
I am getting nan outputs from the encoder of pruned transducer streaming model.
tensor([[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan…