cuda-kernels Search Results

1000+ results
for cuda-kernels

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

QwenLM/Qwen2-VL #141

AWQ和GPTQ模型用官方的根本无法运行

python web_demo_mm.py -c "/data/shared/Qwen/models" --share --server-name 0.0.0.0 --server-port 80 /usr/local/lib/python3.8/dist-packages/auto_gptq/nn_modules/triton_utils/kernels.py:411: FutureWarn…

AppleJunJiang updated 6 days ago
2
quokka-astro/quokka #765

CUDA: for each PR, show delta register usage for GPU kernels

**Describe the proposal** List the GPU kernels with changed register usage as a comment in each PR. This can done by using the `--ptxas-options=v` compiler flag, then parsing the compiler output with…

BenWibking updated 3 weeks ago
1
DualSPHysics/DesignSPHysics #221

How to choose CPU_Blocksize in DesignSPHysics?

I am a beginner. I saw mention of blocksize in section 5.5 of the user manual (DualSPHysic)： “An important novelty since v4.0 is the determination of the optimum Blocksize for CUDA kernels that exc…

JUN-D228 updated 4 days ago
1
microsoft/DeepSpeed #6519

[BUG] RuntimeError: Error building extension 'inference_core…

**Describe the bug** I am trying to run the non-persistent example given for mistralai/Mistral-7B-Instruct-v0.3 on a RTX A6000 GPU (on a server) so compute capability is met, ubuntu is 22.04, CUDA to…

Chetan3200 updated 2 weeks ago
1
CliMA/ClimaCore.jl #1128

Improve CUDA kernels

Using fortran style 1D indexing on the parent, with any required assertions done upstream, might be easiest for some kernels. E.g.: ```julia function Base.copyto!( dest::IJFH{S, Nij}, bc:…

simonbyrne updated 8 months ago
1
vllm-project/vllm #9546

[Usage]: Qwen2VL model mrope implemenation in cuda graph

### Anything you want to discuss about vllm. in qwen2vl's mrope imple, vllm decide whether input positions is for multimodal with ![image](https://github.com/user-attachments/assets/6dfc96d9-5162-…

gujiewen updated 6 days ago
3
anicusan/AcceleratedKernels.jl #6

Support for `dims` kwarg

Base methods, such as `accumulate!`, `mapreduce` have support for `dims` kwarg. Is there a plan for adding such support here? We can then replace other kernels from AMDGPU/CUDA with AK implementatio…

pxl-th updated 1 week ago
2
unslothai/unsloth #996

RuntimeError: expected self and mask to be on the same devic…

When I used fast_cross_entropy_loss instead of torch.nn.CrossEntropyLoss, this error happend. `File "/mnt/fs/user/xingjinliang/unsloth/unsloth/kernels/cross_entropy_loss.py", line 318, in fast_cross…

Silentssss updated 1 month ago
2
DigitalHolography/Holovibes #581

Pipe renaming

- Rename `batch_size` to `frame_packet` (michael) - Rename `insert_wait_frames` to `insert_wait_frame_packet` - Most of the function (cuda kernels, wrappers, ...) in pipe takes an input buffer and a…

Hpn4 updated 2 weeks ago
3
xrsrke/pipegoose #8

Port CUDA Kernels

> Port training CUDA kernels from these librarys, and automatically replace modules in an existing 🤗 `transformers` model with their corresponding CUDA kernel version. Check out the following op…

xrsrke updated 10 months ago
6

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for cuda-kernels

1000+ results
for cuda-kernels