gpu-optimization Search Results

1000+ results
for gpu-optimization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

thu-ml/SageAttention #3

Question about performance on A100

Hey :wave: This is amazing work, thanks! I'm trying to use the example code on an A100 but don't notice any speedups. Is this expected? I do see a 20-30% speedup on a 4090, so maybe it's due to dif…

a-r-r-o-w updated 1 week ago
6
vllm-project/vllm #6915

[Performance] [Speculative decoding]: Compute prepare inputs…

### Proposal to improve performance TL;DR: Move speculative decoding `scoring` prepare inputs to GPU, so a CPU synchronization can be skipped. Currently, speculative decoding copies proposal token…

cadedaniel updated 1 week ago
2
conda-forge/pytorch-cpu-feedstock #275

Make Magma optional for cuda builds?

### Comment: A conda-forge environment with nothing but pytorch for cuda in it currently ways in at 7.2GB, which is perceived to be rather on the heavy side of things. Looking at potential for slimm…

zklaus updated 3 weeks ago
1
intel-analytics/ipex-llm #10897

Improve First Token Latency for multi-GPU projects (by flash…

For multi-GPU solution, we still have challenges for First Token Latency. The breakdown data is shared in offline. please help add more optimization features (like SDP/Flash Attention etc) to improve…

moutainriver updated 1 month ago
1
pytorch/pytorch #129330

Fuyou Training Framework Integration for PyTorch

### 🚀 The feature, motivation and pitch Fuyou Training Framework Integration for PyTorch Description: Integrate the Fuyou training framework into PyTorch to enable efficient fine-tuning of larg…

quasinnovate updated 4 months ago
3
google/trajax #15

Solving a batch of trajectory optimization problems on a GPU

Hi, I am looking for a library to solve batches of trajectory optimization problems (same problem, different initial states) on accelerators, and I've found this library, which looks great! I have…

andreadelprete updated 6 months ago
9
OpenBMB/MiniCPM-V #555

[vllm] -

### 起始日期 | Start Date 9/3/2024 ### 实现PR | Implementation PR _No response_ ### 相关Issues | Reference Issues _No response_ ### 摘要 | Summary When using vLLM to optimally utilize GPU space for faste…

WoutDeRijck updated 2 months ago
1
cis3296f24/Section3ProjectPresentation #16

Multi-Threaded Matrix Calculator

**Project Abstract** This document proposes a matrix calculator web application that supports a wide variety of matrices and matrix operations (multiplication, Gaussian elimination, inversion, decomp…

rten19 updated 3 weeks ago
2
brainhackorg/global2020 #71

Optimization and GPU porting of information flow implementat…

## Project info **Title:** Optimization and GPU porting of information flow implementation **Project lead and collaborators:** Etienne Combrisson & Ruggero Basanisi **Image of the proj…

StanSStanman updated 3 years ago
3
triton-inference-server/tensorrtllm_backend #577

Unable to launch triton server with TP

### System Info Built tensorrtllm_backend from source using dockerfile/Dockerfile.trt_llm_backend tensorrt_llm 0.13.0.dev2024081300 tritonserver 2.48.0 triton image: 24.07 Cuda 12.5 ### Wh…

dhruvmullick updated 3 weeks ago
4

上一页 1...6 7 8 9 10 11 12...100 下一页

1000+ results for gpu-optimization

1000+ results
for gpu-optimization