issues
search
codeplaysoftware
/
cutlass-fork
CUDA Templates for Linear Algebra Subroutines
Other
8
stars
20
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Formatting and pvc gemm args validation
#159
FMarno
opened
2 days ago
0
Unit test for IntelPVC w/ LinCombEltAct EVT
#158
joeatodd
opened
4 days ago
2
Add static_assert to prefetch_selector
#157
joeatodd
closed
1 day ago
0
Add Flash Attention v2 example
#156
muhammad-tanvir-1211
opened
1 week ago
0
First steps to enable SYCL backend in Python Interface
#155
sommerlukas
opened
1 week ago
0
LLM GEMM benchmarks
#154
aacostadiaz
opened
1 week ago
0
Define GmemTiledCopyA/B as TiledCopy in CollectiveMma<IntelPVC,...
#153
joeatodd
closed
1 week ago
1
LinCombPerRowBias & EVT Changes
#152
joeatodd
opened
2 weeks ago
0
Cooperative prefetch
#151
jiyang1011
closed
2 weeks ago
0
Replace deprecated calls to this_nd_item
#150
joeatodd
closed
3 weeks ago
0
Fixed GEMM example performance regression and batch GEMM fail issues.
#149
jiyang1011
closed
3 weeks ago
2
Fix PVC collective builder
#148
aacostadiaz
closed
1 week ago
0
Fix PVC ReLU example
#147
aacostadiaz
closed
4 weeks ago
0
Set google benchmark version
#146
aacostadiaz
closed
4 weeks ago
0
Use oneMKL RNG for Tensor Fill
#145
AD2605
closed
1 week ago
0
Add Launch Bounds
#144
AD2605
opened
1 month ago
0
Jiyang/example
#143
jiyang1011
closed
1 month ago
0
Add support for SYCL on example 35
#142
aacostadiaz
closed
1 month ago
0
Fix issue with including SYCL
#141
aacostadiaz
closed
1 month ago
0
Device Agnostic Pipeline
#140
AD2605
opened
1 month ago
1
SM80 Collective Builder
#139
AD2605
opened
1 month ago
0
Rename Components
#138
AD2605
closed
1 month ago
0
Update README
#137
AD2605
opened
2 months ago
0
Remove caching effects in the Benchmarks
#136
AD2605
closed
1 month ago
0
enable generic cute tests for SYCL_INTEL_TARGET
#135
AD2605
closed
2 months ago
0
[FEA] Need gemm support on SM90
#134
mgrabban
opened
2 months ago
1
Fix GoogleBench build in CI
#133
muhammad-tanvir-1211
closed
2 months ago
0
Implement SplitK and StreamK algorithm for Intel PVC
#132
muhammad-tanvir-1211
closed
1 day ago
3
Enable CUTE APIs (Copy, MMA etc.) for Intel GPU (PVC)
#131
taozha2
closed
1 month ago
3
Cute dev
#130
taozha2
closed
2 months ago
0
[QST] How to do matmul(A, B^T)?
#129
mgrabban
opened
2 months ago
1
Add benchmark counters
#128
aacostadiaz
closed
1 month ago
0
Add benchmark configurations
#127
aacostadiaz
closed
1 month ago
0
SM90 Support
#126
AD2605
opened
2 months ago
1
Unify benchmarks into a single execution
#125
aacostadiaz
closed
2 months ago
0
Use compare equal in the collective builder example
#124
aacostadiaz
closed
3 months ago
0
atomic add
#123
jiyang1011
closed
2 months ago
12
Collective Builder API for PVC
#122
AD2605
closed
3 months ago
0
Draft Swap the X and Y grid dimensions for Intel PVC
#121
aacostadiaz
closed
3 months ago
0
Add local qualifier to the shared memory pointer
#120
aacostadiaz
closed
3 months ago
0
Add TFlop and Bandwidth perf counters
#119
AD2605
closed
3 months ago
0
Fix stack buffer overflow in float to half conversion
#118
AD2605
closed
3 months ago
0
Expose prefetch distance
#117
aacostadiaz
closed
3 months ago
0
Use googlebench in Benchmarks
#116
AD2605
closed
3 months ago
0
Add CUTLASS_SYCL_PROFILING_ENABLED flag
#115
aacostadiaz
closed
3 months ago
0
Simplify error verification
#114
aacostadiaz
closed
3 months ago
0
Enable Cute Unit tests
#113
AD2605
closed
1 month ago
0
Update to Cutlass 3.5.1
#112
aacostadiaz
closed
3 months ago
0
Replace CUTLASS_ENABLE_SYCL with __SYCL_DEVICE_ONLY__
#111
aacostadiaz
closed
3 months ago
0
Specify Test workflow input for manual dispatch
#110
carlewis
closed
3 months ago
0
Next