rotary-position-embedding Search Results

476 results
for rotary-position-embedding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Tencent/HunyuanDiT #66

Questions about ROPE applied on MultiheadCrossAttention

Hi, thanks for sharing your great work! I'm confused about the rope implementation in the cross attention module of each block. https://github.com/Tencent/HunyuanDiT/blob/cb709308d92e6c7e8d59d0d…

yyk-wew updated 5 months ago
2
Dao-AILab/flash-attention #664

Questions about the usage of flash_attn_with_kvcache

I'm sorry but I have to ask the naive question: If I'm testing the attention module in llama_7b inference, what arguments should I pass to this func? For example, the input id is shape [1, 32],…

sleepwalker2017 updated 11 months ago
1
vllm-project/vllm #6701

[Bug]: multi-GPU inference (tensor_parallel_size=2) fails on…

### Your current environment ```text Collecting environment information... WARNING 07-23 19:11:42 _custom_ops.py:14] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm.…

raffenet updated 1 month ago
13
tenstorrent/tt-metal #13277

[Feature Request] Support model Qwen2-7B in the model demos

* Goal: Run model [Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B) on the TT Wormhole device. * Changes: Add this directory `models/demos/wormhole/qwen2_7b`. ## Approach We will leverage the ex…

cthsieh updated 1 month ago
2
unslothai/unsloth #1042

ValueError: Unknown RoPE scaling type longrope

os: windows I think my environment is ready use jupyter notebook locally when i run these: "from unsloth import FastLanguageModel import torch max_seq_length = 8192 # Choose any! We auto sup…

XCYXHL updated 1 month ago
1
OpenNMT/CTranslate2 #1283

can't convert opennmt.py model with alibi or rotary embeddin…

I get this error when `max_relative_positions: -1` or `max_relative_positions: -2` ``` Traceback (most recent call last): File "/opt/conda/bin/onmt_release_model", line 33, in sys.exit(load…

totaltube updated 2 months ago
8
OpenBMB/ModelCenter #41

[BUG] llama outputting random gibberish

**Describe the bug** I used a verified LLaMA 7B hg checkpoint, and used a single thread bmb to do inference. But the output are just random gibberish. Not sure why? **Minimal steps to reproduce…

w32zhong updated 1 year ago
1
exo-explore/exo #378

[BOUNTY - $100] Support Llama 3.2 1B on tinygrad

- Currently we support Llama 3.2 1B on MLX but not tinygrad - Add support for Llama 3.2 1B - Might just work out of the box, if not I think the issue will be in the changes that were made to RoPE (R…

AlexCheema updated 4 days ago
10
NVIDIA/Megatron-LM #1032

[BUG] When the model has extra layers, initializing the mode…

**Describe the bug** I'm trying to use the Llama2 model saved with `--use-dist-ckpt` after SFT (Supervised Fine-Tuning) to train a reward model. The reward model does not require the original checkpo…

haolin-nju updated 1 week ago
2
NVIDIA/TensorRT-LLM #1770

Fail to build w4a8_awq on Llama 13b

### System Info ubuntu 20.04 tensorrt 10.0.1 tensorrt-cu12 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm …

Hongbosherlock updated 5 days ago
12

上一页 1...3 4 5 6 7 8 9...48 下一页

476 results for rotary-position-embedding

476 results
for rotary-position-embedding