issues
search
bluss
/
matrixmultiply
General matrix multiplication of f32 and f64 matrices in Rust. Supports matrices with general strides.
https://docs.rs/matrixmultiply/
Apache License 2.0
209
stars
25
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Implement sgemm and dgemm using fma
#36
SuperFluffy
closed
5 years ago
15
Use fma, fused multiply add, for architectures supporting fma
#35
SuperFluffy
closed
5 years ago
4
Use optimal kernel parameters (architectures, matrix layouts)
#34
SuperFluffy
opened
5 years ago
7
Implement DGEMM kernel using avx intrinsics
#33
SuperFluffy
closed
5 years ago
9
Don't shadow c in sgemm_kernel::kernel_x86_avx
#32
SuperFluffy
closed
5 years ago
1
Explore performance of _mm256_blend_ps vs _mm256_shuffle_ps
#31
SuperFluffy
closed
5 years ago
3
Investigate if _mm256_broadcast_ss outperforms _mm256_set1_ps
#30
SuperFluffy
closed
5 years ago
3
Panic when benchmarking with target-feature=sse
#29
SuperFluffy
closed
5 years ago
4
WIP: i32 gemm experiment
#28
bluss
opened
5 years ago
9
In the sgemm avx kernel, transpose if we can match C's layout
#27
bluss
closed
5 years ago
0
Speed up packing by using copy_nonoverlapping
#26
bluss
closed
5 years ago
1
Allow operations on transposed matrices, i.e. Op(A) and Op(B), and DSYRK
#25
SuperFluffy
opened
5 years ago
8
Integer matrices
#24
SuperFluffy
opened
5 years ago
9
Use ifunc strategy or other ways to only check target feature existance once
#23
bluss
opened
5 years ago
1
Use std::arch SIMD and runtime target feature detection
#22
bluss
closed
5 years ago
4
Fix handling of zero-size arrays
#21
jturner314
closed
5 years ago
3
Relax debug assertion on strides of C matrix
#20
jturner314
closed
5 years ago
2
Add .gitignore
#19
jturner314
closed
5 years ago
0
ICE's on nightly rust: resolving bounds after type-checking
#18
bluss
closed
7 years ago
1
Use CARGO_CFG_TARGET_FEATURE to pick sgemm 8x8 if avx exists
#17
bluss
closed
7 years ago
0
Use no local arrays
#16
bluss
closed
8 years ago
0
Improve unrolling in sgemm kernel
#15
bluss
closed
8 years ago
2
Revert the workaround for array zeroing
#14
bluss
closed
5 years ago
2
set up benchmarks to run on stable with cargo bench (Test)
#13
bluss
closed
8 years ago
1
Nozeroed (test)
#12
bluss
closed
8 years ago
0
Run benchmarks using travis
#11
bluss
closed
8 years ago
0
Use mem::zeroed to fill the gemm kernel's array for the vectors
#10
bluss
closed
8 years ago
0
Performance regression on nightly
#9
bluss
closed
8 years ago
0
SNB Performance
#8
millardjn
closed
7 years ago
2
Use one Vec for both packing buffers
#7
bluss
closed
8 years ago
12
ref_mat_mul not always the slower version
#6
MagaTailor
closed
8 years ago
16
Test build i686 with travis
#5
bluss
closed
8 years ago
0
Use a 4-by-8 microkernel for sgemm
#4
bluss
closed
8 years ago
0
Add sgemm and dgemm asm microkernels from BLIS
#3
bluss
closed
3 years ago
4
Non-square µ-kernels, aligned buffers and a 8-by-4 kernel for dgemm
#2
bluss
closed
8 years ago
0
A faster multiplication with more ymm1; Also make the mask kernel more generic
#1
bluss
closed
8 years ago
5
Previous