-
### Your current environment
I'm not able to run `collect_env.py` on this workstation
vllm == 0.5.1
vllm-flash-attn == 2.5.9
torch == 2.3.0
Tested on a single A100-80GB
The following mes…
-
Good understanding of deep learning architectures like Multi-Layer Perceptron, Recurrent Neural Networks (RNNs), Long Short Term Memory models (LSTMs), Gated Recurrent Units (GRUs), and Convolutional …
-
# Goal
------
* Many of the new LLMs models support long context. For example, lamma 3.1 and Mistral 2 support 128k;
* The trend is upwards, e.g. Gemini support 1M - 10M. Claude supports 200k;
* …
-
1.Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop(2019)
collaborate regression-based (as initial pose) and iterative optimization-based approach.
code: No
2.Weakly S…
-
@federicobucchi @wangkuiyi @tuzhucheng How to support models such as Baichuan, Bloom, or QWEN, do modeling need to be modified, and can you provide steps to support training other models?
-
I am using Anaconda to build my own project. I am using Python version 3.10.14 and downloaded Ollama, pulled Mistral for my LLM, and pulled Nomic-Embed-Text for my embedding model. I followed the inst…
-
1.看了整个代码,感觉作者并没有将 last_slot_label信息加入 decoder的解码中
2.这个与Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling的实现好像不太一样,感觉作者的方法还是encoder-decoder-attention,即论文中的第二种…
-
### Abstract
- propose additional "Attentive Recurrent Network(ARN)" to Transformer encoder to leverage the strengths of both attention and recurrent networks
- WMT14 EnDe and WMT17 ZhEn demonstra…
-
Hi, I am new to the attention mechanism and I found your codes, tutorials very helpful to beginners like me!
Currently, I am trying to use your attention decoder to do the sentiment analysis of the…
-
# 🌟 New model addition
## Model description
Recently Google is published paper titled ["Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matchin…