-
With #20, the parallel schedule seems to scale perfectly on many cores:
```
$ OMP_NUM_THREADS=1 OPENBLAS_NUM_THREADS=1 ./build/gemm_f32_serialWarmup: 0.9036 s, result 224 (displayed to avoid comp…
-
### bug描述 Describe the Bug
下面这种环境无法运行paddle develop
```
python -m pip install paddlepaddle-gpu==0.0.0.post120 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html
```
```
Python 3…
-
**Describe the bug**
I am trying to run the command ilab data generate --taxonomy-path ./taxonomy when I am getting this error
**To Reproduce**
Setup ilab with latest release version on a RHE…
-
This is a spinoff of vectorisation issue #71 and a followup to the big PR #171.
---
(The first part of this description also serves as documentation of what is available there now!).
The curr…
-
-
### What happened?
```
You are a helpful assistant
> what is 2+2+2+2
44444444444444444444444444444444444444444444444444444444444444444444444444444444444444444
>
```
When I run llama-cli with…
-
### 🐛 Describe the bug
(I'll add actual benchmarking details and logs and output_code.py in a bit)
I'm doing min_sum and mul_sum in two setups:
1. (D, ) x (D, ) -> scalar
2. (B, N, 1, D) x (B,…
-
### 🐛 Describe the bug
I have installed pytorch on my python venv as well as my conda environment using the download format as given on the official pytorch documentation with cuda11.8 as:
`pip3 i…
-
I am writing a machine learning software that needs to compute “Y = exp(a⋅X)”.
Sample code:
```c++
#include
#include
void func(float a[]) {
for(std::size_t i = 0; i != 16; i++) {
…
-
I passed a Target object with NoAssert turned on to compile_to_lowered_stmt on a pipeline of mine. Here's what I get:
module name=117_0_t, target=x86-64-linux-avx-avx2-avx512-avx512_skylake-f16c-fm…