attention-mechanisms Search Results

1000+ results
for attention-mechanisms

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

dido1998/Recurrent-Independent-Mechanisms #7

GroupLinearLayer should add "device" parameter

Hi RIM dev team, Code fails when device = 'cuda' It can be easily solved adding an extra parameter "device" to all "Group" classes. Thank you for the great RIM implementation, Ildefons

ildefons updated 3 years ago
2
JyotsnaT/ML-interviews #9

Deep Learning Mastry

Good understanding of deep learning architectures like Multi-Layer Perceptron, Recurrent Neural Networks (RNNs), Long Short Term Memory models (LSTMs), Gated Recurrent Units (GRUs), and Convolutional …

JyotsnaT updated 6 months ago
2
magland/remfile #9

My thoughts on this project

This was recently brought to my attention. I am glad that you are able to get better performance than standard fsspec. First a couple of notes - fsspec provides multiple possible (memory) caching…

martindurant updated 8 months ago
3
microsoft/DeepSpeed-MII #457

[FEATURE REQUEST] Add Support for Qwen1.5-MoE Architecture i…

# Qwen1.5-MoE Support With the increasing attention on mixture-of-experts (MoE) models, especially following the advancements heralded by Mixtral, I propose considering the integration of the Qwen1.5…

freQuensy23-coder updated 5 months ago
1
logisim-evolution/logisim-evolution #1294

Feature Request: maven or jitpack

When we are developing some library, we need to include logisim-evolution with gradle. If you want to publish this to maven repositry, `maven-publish` plugin should added to `build.gradle` and set …

xtexChooser updated 2 years ago
1
16lemoing/waldo #2

Future Layer Prediction

Hello, May I know if the future layer prediction (FLP) exclusively forecasts the future control points on a layer-wise basis, without considering interactions or dependencies with other layers.

skrya updated 1 year ago
1
huggingface/transformers #27453

Audio-MAE - ViTMAE for audio

### Model description This model is is a Self-supervised Vision Transformer that uses patch reconstruction as the spectrogram task. It extends MAE (which is already on HuggingFace) for audio. This mo…

justinluong updated 6 months ago
15
open-telemetry/weaver #124

[Refactor] Refactor CompoundError Usage to Distinguish Betwe…

Currently, we collect as many errors as possible via the CompoundError mechanism. However, these "errors" are not typical Rust errors, but rather diagnostic messages intended for user reporting via th…

lquerel updated 4 months ago
3
moxious/triage #377

Every time I access my linux dashboard, it's telling me to m…

### What happened? Every time I access my linux dashboard, it's telling me to migrate from Angular, which I do. Over and over and over and over again ### What did you expect to happen? I expe…

tonypowa updated 1 month ago
4
pytorch/pytorch #57230

Implementation of Self Attention vs Encoder Decoder Attentio…

## 🐛 Bug The following code snippet from multihead attention module is using tensor.equal method to compare query, key and value to determine if the attention module is being used as self-attention…

ultrons updated 3 years ago
3

上一页 1...6 7 8 9 10 11 12...100 下一页

1000+ results for attention-mechanisms

1000+ results
for attention-mechanisms