gated-attention Search Results

768 results
for gated-attention

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

JyotsnaT/ML-interviews #9

Deep Learning Mastry

Good understanding of deep learning architectures like Multi-Layer Perceptron, Recurrent Neural Networks (RNNs), Long Short Term Memory models (LSTMs), Gated Recurrent Units (GRUs), and Convolutional …

JyotsnaT updated 7 months ago
2
pytorch/pytorch #122660

Deprecate `torch.nn.MultiHeadAttention` and `torch.nn.Transf…

### 🚀 The feature, motivation and pitch This issue is WIP and is a placeholder to track discussion around the deprecation of `torch.nn.MultiHeadAttention` and `torch.nn.Transformer`-related `torch.…

mikaylagawarecki updated 6 months ago
1
philschmid/deep-learning-pytorch-huggingface #39

Precision Issue

Hi Philipp! Thanks for this great repo! I was trying to run llama2 instruction tuning following the [tutorial](https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training…

zihaohe123 updated 1 year ago
4
TransformerLensOrg/TransformerLens #97

Better docs for model properties

Make [this table](https://github.com/neelnanda-io/TransformerLens/blob/main/easy_transformer/model_properties_table.md) better and cover key info for model architecture - whether it uses parallel attn…

neelnanda-io updated 3 months ago
5
lucidrains/graph-transformer-pytorch #3

High GPU Memory Usage during Batch Execution of Graphs in Gr…

# Description: When running a batch of 32 graphs using the GraphTransformer object, there is a notable increase in GPU memory usage during the operation on the edges. The memory spikes from approxima…

EloyAnguiano updated 11 months ago
1
tyliupku/wiki2bio #14

Where is the test dataset accuracy output?

Where is the test dataset BLUE score?

zzj0402 updated 4 years ago
2
GTNewHorizons/GT-New-Horizons-Modpack #17388

[RFC] Botania based IC2 crop replication

### Your GTNH Discord Username _No response_ ### Your Pack Version 2.6.1 ### Your Proposal Add some way to replicate ic2 crop replication ### Your Goal 1. This method should use botania, as it'…

Glease updated 2 weeks ago
5
NVIDIA/TensorRT-LLM #1753

Enc-Dec C++ Runtime Paged KV - Inflight Batching output junk…

I try inference my T5 model with C++ runtime used Paged KV at the commit `b777bd64750abf30ca7eda48e8b6ba3c5174aafd`. Its result is normal when inference with single input text, but with multiple input…

thanhlt998 updated 2 months ago
8
md-mohaiminul/TranS4mer #1

It seems like Gated S4 is not used.

Thanks for the amazing work and sharing tidy codes! As I'm looking into your code, I've found something different from as expected. It seems like S4 used is just its vanilla version instead of Gated S…

Haawron updated 6 months ago
1
lllyasviel/stable-diffusion-webui-forge #1817

gelu_new !?!?!? Flux is completely broken in this project.

I'm getting this error as well. ```py StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0} Using Default T5 Data Type: torch.float…

webmaster-exit-1 updated 4 days ago
4

上一页 1...1 2 3 4 5 6 7...77 下一页

768 results for gated-attention

768 results
for gated-attention