-
-
Looking at https://docs.google.com/spreadsheets/d/1lGFf6PLGmBUSMan-YP7Vul4DpRNfn6K8oeCjBILe6uA/edit#gid=857482380, it seems that cuDNN instead of default CUDA can boost lczero performance. I tried to …
-
Hello, I have been wondering if it was possible to expand the mpv video output shader stage with the following features?
I know this might not be on the devs' main priority, but if I were to try to i…
-
## Description
`mx.io.ImageRecordIter` or `src/io/iter_image_recordio_2.cc` doesn't respect dtype parameter taken.
It is designed to only work with float32 because of instantiating the class with r…
-
For some problems the convolve_gaussian is taking the majority of the compute time, more than the reflectivity calculation. The convolve_uniform is at least 20 times faster - should we consider an ap…
-
Let's discuss how operator fusion might work in dfdx. I suspect it will require a lot of work. On cuda side of things it will at least require jit compiling kernels.
_Originally posted by @jafioti …
-
I am able to convert caffenet, but got an error when I try it with vgg16.
F0312 08:11:17.590416 30365 insert_splits.cpp:35] Unknown blob input data to layer 0
**\* Check failure stack trace: ***
Abor…
-
### 🐛 Describe the bug
Observed unexpected nan outputs from the `torch.nn.functional.conv1d` function on M1 Mac
```python
# bug_demo.py
import torch
n_trials = 100
for ii in range(n_trials…
-
After testing with the new architecture for some time, I am very impressed by its playing strength after such a relatively short training period. However, the new mish activation has proved to be quit…
-
Hello, I wonder if there are any future plans to optimize Conv2D CPU execution. I guess currently MLX uses a naive implementation?