-
Good understanding of deep learning architectures like Multi-Layer Perceptron, Recurrent Neural Networks (RNNs), Long Short Term Memory models (LSTMs), Gated Recurrent Units (GRUs), and Convolutional …
-
### 🚀 The feature, motivation and pitch
This issue is WIP and is a placeholder to track discussion around the deprecation of `torch.nn.MultiHeadAttention` and `torch.nn.Transformer`-related `torch.…
-
Hi Philipp!
Thanks for this great repo!
I was trying to run llama2 instruction tuning following the [tutorial](https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training…
-
Make [this table](https://github.com/neelnanda-io/TransformerLens/blob/main/easy_transformer/model_properties_table.md) better and cover key info for model architecture - whether it uses parallel attn…
-
# Description:
When running a batch of 32 graphs using the GraphTransformer object, there is a notable increase in GPU memory usage during the operation on the edges. The memory spikes from approxima…
-
Where is the test dataset BLUE score?
-
### Your GTNH Discord Username
_No response_
### Your Pack Version
2.6.1
### Your Proposal
Add some way to replicate ic2 crop replication
### Your Goal
1. This method should use botania, as it'…
-
I try inference my T5 model with C++ runtime used Paged KV at the commit `b777bd64750abf30ca7eda48e8b6ba3c5174aafd`. Its result is normal when inference with single input text, but with multiple input…
-
Thanks for the amazing work and sharing tidy codes! As I'm looking into your code, I've found something different from as expected. It seems like S4 used is just its vanilla version instead of Gated S…
-
I'm getting this error as well.
```py
StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0}
Using Default T5 Data Type: torch.float…