-
### 🐛 Describe the bug
Issue summary:
As part of process to add CUDA ARM nightly wheel, we are seeing long build compilation time. Needs ~5hrs to compile https://github.com/pytorch/pytorch/actio…
-
I am currently unable to judge whether my `GEMMLowpOutputStageInfo` configuration is wrong or there are some bugs in the codes. The `GEMMLowpOutputStageInfo` I configured is listed below:
```
arm_co…
-
After compiling this toolchain on colab, I feel this process is really time-consuming and not portable at all, the building directory and the install-dir are both large in size.
It would be great …
-
I am using BLIS for neural networks on embedded platforms (mostly ARMv8a), and I would like to reap the potential memory savings as well as possibly some speedups from running with half-precision floa…
-
### 🐛 Describe the bug
When I build torch-v1.12.1 from source, there is something wrong "caffe2/CMakeFiles/torch_cuda.dir/__/torch/csrc/distributed/c10d/quantization/quantization_gpu.cu.o No such fil…
-
Hello,
I am trying to use [Ultra-Light-Fast-Generic-Face-Detector-1MB](https://github.com/Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB/tree/master) model in my app.
I have converted the onn…
-
Steps to reproduce the behavior:
> git clone --recursive https://github.com/pytorch/pytorch
> cd pytorch
> git submodule sync
> git submodule update --init --recursive
> export CMAKE_PREFIX_PAT…
-
According to code in https://github.com/huggingface/quanto/blob/main/quanto/tensor/qbitstensor.py#L34 I find quanto use uint dtype to store the quantized value in affine quantizer, while in symmetric …
-
@tensorflow/micro
**System information**
- Host OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux pop-os 5.8.0-7625-generic
- TensorFlow installed from (source or binary): source
-…
-
Great work! I have a question about the current_error here: https://github1s.com/mobiusml/hqq/blob/master/hqq/core/optimize.py#L30, should we use the p=0.7 instead of p=1 for the l-p norm error? Becau…