-
This is preliminary and will be updated or discarded as comments come in.
## Current Issues
1) To maximize performance, we need to be able to utilize shared memory buffers more generically as te…
-
Hi,
I used Windows system on Snapdragon ARM CPU and noticed that kernel can't build:
`arm_conv::winograd::weight_transform::a64_fp16_4x4_3x3`
and other fp16 kernels in:
`src/core/NEON/kernels/co…
-
Running into the following issue:
```
marc@mbp mflux-tests % mflux-generate --prompt "hello" -m dev
Fetching 8 files: 100%|████████████████████████████████████████████████████████████████████████…
-
When I tuned resnet-50 using meta-schedule, I found that conv2d_winograd implementations raise an error "can not found variable buf_dyn_shmem". After I deep in, I think it's caused by `MergeDyna…
-
Hi. I’m a high school Go and AI enthusiast, and I am working on a KataGo-related project in the context of a three-year high school science research class. I would greatly appreciate assistance with…
-
# Motivation & Goals
Tensor data layout describes how the data is laid out in memory. It determines the memory access pattern and it can significantly impact performance and memory efficiency. Glob…
-
After serious perf improvements by NVIDIA's CUDNN R4 across board, I suppose Nervana weren't too happy to be left behind.
They've just released (as part of Neon) their Winograd-based kernels which ha…
-
I'm having trouble reproducing the performance numbers for AlexNet in the NNPACK README.md. I'm using the nnpack-pr branch [here](https://github.com/ajtulloch/caffe/tree/nnpack-pr), and timing using t…
-
Hello, sorry to bother you. Not sure if that is even Katago's problem or something else. Have been using kata under Linux for years using OpenCL with an AMD GPU (RX570). Recently I switched to a RX677…
-
Before you open an issue, please make sure you have tried the following steps:
1. Make sure your **environment** is the same with (https://mace.readthedocs.io/en/latest/installation/env_requirement…