kernel-computation Search Results

1000+ results
for kernel-computation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

UoB-HPC/miniBUDE #37

CUDA version: Issue with "best gflop/s" if the validation fa…

This is for the CUDA version. If the CUDA kernel launch fails, the results will fail validation but still be included in the results, so the "best gflop/s" will be too big since the kernel time was ve…

colleeneb updated 1 month ago
2
JaxGaussianProcesses/GPJax #381

feat: Default and overridable `solve` and `root` computation…

Post #370, akin to kernel / likelihood quadrature computation, would be good to have on the top level default methods for these defined on the `Prior` that say what solver and root method we'd like to…

daniel-dodd updated 1 day ago
2
apache/mahout #457

Design data encoder solution

* research if/how these are implemented in each backend * research the different methodologies to assess scope - enumerate strategies e.g. Basis, Amplitude, etc - document constraints e.g. …

andrewmusselman updated 3 weeks ago
2
compdyn/partmc #132

GPU: Asynchronous data transfers and kernels - Heterogeneous…

I will work on this in the branch for #129. This issue is to document all ideas for asynchronous GPU execution, allowing GPU and CPU computation simultaneously.

cguzman95 updated 5 years ago
5
hikettei/Caten #196

Workload: Finally GPT2 Inference

## TODO - [x] Optimize JIT, fix memory planner #193 - [x] Complete test-suite/test-dynamic-shape.lisp - [x] More tests on the JIT kernel accuracy (compared to PyTorch, like Multi Head Attention an…

hikettei updated 3 days ago
3
oneapi-src/oneDNN #2219

Why does TensorBoard's Trace Viewer show blank waiting times…

I ran the program on an x86 machine using oneDNN as the backend library and on an ARM machine using the default library. The TensorBoard profiling data shows blank waiting times on the ARM machine, wh…

nanzh-19 updated 3 days ago
6
foundation-model-stack/fms-acceleration #76

Introduce Liger Fused Cross Entropy Kernel to FOAK Plugin

## Description Consider adding additional FusedCrossEntropyLoss kernel to FOAK set of kernels given the additional improvement seen using it in earlier tests (See Background below). Considerati…

achew010 updated 1 week ago
3
NVIDIA/nvbench #184

Enable CUPTI to measure kernel execution time instead of CUD…

CUDA events suffer from low accuracy and include the kernel launch overhead. On the other hand, CUPTI provides a more reliable way to get consistent timing measurement. This request asks to add an op…

fbusato updated 2 months ago
1
ggerganov/llama.cpp #4085

metal : compile-time kernel args and params

I was just thinking about this idea, so writing it down for future research. We should be able to fairly easy generate model-specific Metal code that has hardcoded kernels for every single node in …

ggerganov updated 2 weeks ago
4
pytorch/executorch #6975

Missing Out Variants When Running Llama3.2 Example Without X…

I am follwing the [instructions in the Llama2 README](https://github.com/pytorch/executorch/blob/d9aeca556566104c2594ec482a673b9ec5b11390/examples/models/llama2/README.md#instructions) to test llama m…

sheetalarkadam updated 2 days ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for kernel-computation

1000+ results
for kernel-computation