cuda-kernels Search Results

1000+ results
for cuda-kernels

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/cuda-quantum #505

[RFC] Reimplement the nvq++ driver

Currently the nvq++ driver is a skeletal bash shell script that runs the various components that comprise the logical, piecewise steps of a nvq++ compilation. The bash script is very easy to update an…

schweitzpgi updated 1 year ago
1
sampepose/flownet2-tf #51

fatal error: nsync_cv.h: No such file or directory

nvcc -g -std=c++11 -I`python -c "import tensorflow; print(tensorflow.sysconfig.get_include())"` -I"/usr/local/cuda-8.0/include" -DGOOGLE_CUDA=1 -D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES -D__STRICT_A…

ziyueSeo updated 5 years ago
3
QwenLM/Qwen2-VL #346

How to train on both image data and text-only data together?

Hi ! In your paper, you mentioned that including text-only data in training is crucial for maintaining language abilities. I'm currently performing full fine-tuning using LLaMA Factory, and I'm enc…

dooinee-bom updated 1 week ago
2
intel/llvm #11502

sycl::complex, std::complex, or sycl::ext::oneapi::experimen…

Is "sycl::complex" mentioned in the SYCL specification ? Which type is recommended for Intel, AMD, and NVIDIA GPUs ? Thanks. ``` no template named 'complex' in namespace 'sycl'; did you mean 'std:…

jinz2014 updated 3 months ago
9
NVIDIA/MatX #788

[DOC] Document How to Implement Custom Operators

**Which documentation should be updated?** How to implement custom operators should be documented and the documentation should address things like: 1. How to support broadcasting the operation over …

deanljohnson updated 1 day ago
4
karpathy/llm.c #354

What would be the main design trade-offs when re-implementin…

Hi Andrej, this implementation is fantastic! In your view, what would be the main design trade-offs if one were to re-implement the C code that is intended to run on the CPU in modern C++? By moder…

mikeroberts3000 updated 6 months ago
2
MzeroMiko/VMamba #223

selective_scan_cuda with CUDA ERROR

I have successfully built the selective_scan_cuda function. However, when I call the function, I encounter the following error. Based on the information I found online, it appears that my GPU is too o…

XXN-1N updated 4 months ago
2
NVIDIA/cccl #638

[FEA]: Add support for CUDA driver API in CUB

### Is this a duplicate? - [X] I confirmed there appear to be no [duplicate issues](https://github.com/NVIDIA/cccl/issues) for this request and that I agree to the [Code of Conduct](CODE_OF_CONDUCT…

mtavenrath updated 11 months ago
2
huggingface/transformers #28160

[Flash Attention 2] Performance improvement

### Feature request The current flash attention 2 integration is sub-optimal in performance because it requires unpadding and padding the activations on **each** layer. For example in llama impleme…

li-plus updated 10 months ago
3
flashinfer-ai/flashinfer #515

Is lean attention supported by flash infer?

I notice this project is inspired by stream-k, how is the work done? I notice the lean attention uses stream k for attention, is this supported in flash infer?

sleepwalker2017 updated 10 hours ago
13

上一页 1...92 93 94 95 96 97 98...100 下一页

1000+ results for cuda-kernels

1000+ results
for cuda-kernels