large-kernels Search Results

1000+ results
for large-kernels

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/cutlass #1780

[FEA] transpose in epilogue/prologue

Now I'm using cutlass in my project. I found that some cases have constraints to the layout, such as input matrix A and output matrix C should be row major. These kinds of assumption limit the feasibi…

xiaonans updated 1 month ago
6
jupyter-server/jupyter_server #1422

Slower atexit methods do not run to completion

## Description Cleanup methods that are slower do not run to completion when restarting kernel. ## Reproduce 1. Create a new IPython notebook in Jupyter Lab. 2. Create and ex…

parantapa updated 1 month ago
3
weaveworks/weave #2376

make fastdp work with large MTUs on 4.3/4.4 kernels

To address the MTU problem listed in #1853 for 4.3 and 4.4 kernels we could allow a user to pass a netdev interface name. The interface would be used as a VTEP (`lowerdev`) for VXLAN netdev we create.…

brb updated 5 years ago
3
huggingface/transformers #32861

Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to H…

### Feature request Integrate Liger (Linkedin GPU Efficient Runtime) Kernel to HuggingFace Trainer, user could decide whether to enable kernel with a simple flag ### Motivation Liger (Linkedi…

JasonZhu1313 updated 1 month ago
4
AllenInstitute/visual_behavior_glm #174

Interpretation of Pupil and Running kernels

For the discrete kernels, looking at the temporal plots is pretty meaningful, since each kernel gets added just once. However for the continuous kernels the kernels get convolved against some long …

alexpiet updated 2 years ago
8
vllm-project/vllm #3358

Does 'all-reduce kernels are temporarily disabled' is the ca…

Hello, I used to use vllm to work with codellama2 13B using only 2 NVIDIA L4 GPU. The engine setup as follow: python -m vllm.entrypoints.openai.api_server --model="codellama/CodeLlama-13b-Instruct…

SafeyahShemali updated 1 week ago
1
apache/arrow-rs #6692

Optimize take/filter from multiple input arrays to a single …

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Upstream in DataFusion, there is a common common pattern where we have multiple input `Record…

alamb updated 2 days ago
17
vllm-project/vllm #3219

Performance issue when loading lora modules

I compared two ways to launch the server. The model is vicuna-7b, and GPU is 2 \* A30. and the 1st way is ``` python -m vllm.entrypoints.openai.api_server \ --model /data/models/vicuna-…

sleepwalker2017 updated 1 week ago
4
NVIDIA/TensorRT-LLM #1925

moe kernel Assertion failed when running qwen2-moe-57B-A14B …

I am using trtllm 0.8.0 (added moe support following llama's implementation). we serve models with trtllm_backend (docker images triton-trtllm-24.02) [qwen2-moe-57B-A14B](https://huggingface.co/Qwe…

handoku updated 3 months ago
2
tensorflow/tfjs #8415

Slow initial runs followd by blazing fast computation

Hello, I am having this issue where I run an initial computation, basic multiplication of matrices using TFJS. On the first run, the computation is extremely slow, taking 800ms. On the second run, …

h-OUS-e updated 6 days ago
8

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for large-kernels

1000+ results
for large-kernels