recurrent-attention-model Search Results

942 results
for recurrent-attention-model

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #6438

[Bug]: Severe computation errors when batching request for m…

### Your current environment I'm not able to run `collect_env.py` on this workstation vllm == 0.5.1 vllm-flash-attn == 2.5.9 torch == 2.3.0 Tested on a single A100-80GB The following mes…

lance0108 updated 3 months ago
4
JyotsnaT/ML-interviews #9

Deep Learning Mastry

Good understanding of deep learning architectures like Multi-Layer Perceptron, Recurrent Neural Networks (RNNs), Long Short Term Memory models (LSTMs), Gated Recurrent Units (GRUs), and Convolutional …

JyotsnaT updated 7 months ago
2
pytorch/torchtune #1244

[RFC] Long context fine tuning in torchtune

# Goal ------ * Many of the new LLMs models support long context. For example, lamma 3.1 and Mistral 2 support 128k; * The trend is upwards, e.g. Gemini support 1M - 10M. Claude supports 200k; * …

felipemello1 updated 1 week ago
15
ouusan/some-papers #22

Choosing Appropriate Learning Strategies

1.Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop（2019） collaborate regression-based (as initial pose) and iterative optimization-based approach. code: No 2.Weakly S…

ouusan updated 1 week ago
1
apple/ml-recurrent-drafter #2

How to support other models?

@federicobucchi @wangkuiyi @tuzhucheng How to support models such as Baichuan, Bloom, or QWEN, do modeling need to be modified, and can you provide steps to support training other models?

skyCreateXian updated 1 month ago
1
TheAiSingularity/graphrag-local-ollama #44

SUCCESS: Global Search Response: I am sorry but I am unable …

I am using Anaconda to build my own project. I am using Python version 3.10.14 and downloaded Ollama, pulled Mistral for my LLM, and pulled Nomic-Embed-Text for my embedding model. I followed the inst…

galen36123612 updated 3 weeks ago
7
applenob/RNN-for-Joint-NLU #17

代码没有将 last_slot_label信息加入 decoder的解码中

1.看了整个代码,感觉作者并没有将 last_slot_label信息加入 decoder的解码中 2.这个与Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling的实现好像不太一样,感觉作者的方法还是encoder-decoder-attention,即论文中的第二种…

hkxIron updated 5 years ago
1
YeonwooSung/ai_book #4

Modeling Recurrence for Transformer

### Abstract - propose additional "Attentive Recurrent Network(ARN)" to Transformer encoder to leverage the strengths of both attention and recurrent networks - WMT14 EnDe and WMT17 ZhEn demonstra…

YeonwooSung updated 4 years ago
1
datalogue/keras-attention #38

Vanishing Gradient Problem Occurred During Training

Hi, I am new to the attention mechanism and I found your codes, tutorials very helpful to beginners like me! Currently, I am trying to use your attention decoder to do the sentiment analysis of the…

bright1993ff66 updated 5 years ago
4
huggingface/transformers #9526

Siamese Multi-depth Transformer-based Hierarchical Encoder

# 🌟 New model addition ## Model description Recently Google is published paper titled ["Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matchin…

lalitpagaria updated 3 years ago
3

上一页 1...1 2 3 4 5 6 7...95 下一页

942 results for recurrent-attention-model

942 results
for recurrent-attention-model