-
This is a feauture request rather than an issue. Since the complexity of this attention is still quite high it would be nice to have the option of making the network reversible like some of your other…
-
Thanks for your great work!
I have a few enquiries about your implementations:
1. Could you reproduce the paper results (or approximately similar) with your implementation?
2. While ordinary tra…
-
great work!
I noticed the linformer input is (batch_size, seq_len, channels), can seq_len be variable length or should the attention be masked if seq_len is padded? why seq_len is a fixed length?
-
Can you explain the visualization results? What is the meaning of each head plots?
-
## 🐛 Bug
When I run ddp with sample script I get the following error:
```python
Variable._execution_engine.run_backward(
RuntimeError: Expected to mark a variable ready only once. This error…
-
Hi! thanks for sharing the wonderful work! I observe when using the larger output resolution for backbone, e.g., the DC5 model in the paper, the encoder is going to take a lot of time, I'm wondering i…
-
Hi, I trying to run informer training with DistributedDataParallel, parameter_sharing="layerwise" and get this error
```python
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/…
-
Suggest a paper you would like us to discuss during our weekly paper reading discussion. It can be a paper from RL, Computer Vision, NLP, or any ML related paper.
You can vote on a suggested paper…
-
Is Autopadder supposed to work with Linformer?
If I try this:
```
import torch
from linear_attention_transformer import LinearAttentionTransformer
from linear_attention_transformer.autopadder…