issues
search
ROCm
/
Tensile
Stretching GPU performance for GEMMs and tensor contractions.
MIT License
218
stars
147
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Fix StreamK Partials Cache Behavior
#1885
bethune-bryant
closed
8 months ago
9
Temporarily disable stream-k tests on gfx94X
#1884
AlexBrownAMD
closed
8 months ago
0
HOTFIX: update release 6.1 with the latest commits in develop
#1883
babakpst
closed
8 months ago
0
Reduce extended test time
#1882
AlexBrownAMD
closed
8 months ago
0
fix memory allocation fail with FlushMemorySize + StridedBatched/Batched cases
#1881
nakajee
closed
8 months ago
2
Disable InitAccVgprOpt for Stream-K
#1880
AlexBrownAMD
closed
8 months ago
0
Revert "Use fallback libraries for archs without optimized logic (#1862)"
#1879
nakajee
closed
8 months ago
4
skip sgemm 64bit offset tests for gfx94x
#1878
nakajee
closed
8 months ago
0
Fix BufferLoad=False with stream-k
#1877
AlexBrownAMD
closed
8 months ago
1
fix rocblas build fail on gfx11
#1876
nakajee
closed
8 months ago
1
fix mismatch issue with GlobalReadCoalesceGroup
#1875
nakajee
closed
8 months ago
0
[Feature]: Restructure the code to build a wheel and use importlib to embed non-python files
#1874
bioinfornatics
opened
9 months ago
1
Skip DTV, DTL, LSU+MFMA tests for gfx908
#1873
nakajee
closed
9 months ago
2
DirectToVgpr + packing support, increase extended test timeout
#1872
nakajee
closed
9 months ago
8
Small fix for LdsPad auto
#1871
nakajee
closed
9 months ago
2
Can we structure the code in order i) to build a wheel ii) to use importlib for embedded non python file
#1870
bioinfornatics
opened
9 months ago
0
Force GlobalReadCoalesceGroupA, B to True
#1869
nakajee
closed
9 months ago
1
Optimize temp vgpr allocation for ClusterLocalRead
#1868
nakajee
closed
9 months ago
2
Re-enable tests to test new build
#1867
AlexBrownAMD
closed
8 months ago
0
Update xfail, 1sum tests only failing on gfx90a
#1866
AlexBrownAMD
closed
9 months ago
0
Add new tuning scripts
#1865
AlexBrownAMD
closed
9 months ago
0
enable VgprForLocalReadPacking + PrefetchLocalRead=1
#1864
nakajee
closed
9 months ago
0
Update link to Wiki page
#1863
dgaliffiAMD
closed
9 months ago
0
Use fallback libraries for archs without optimized logic
#1862
GZGavinZhao
closed
9 months ago
9
Stream-K Batch
#1861
AlexBrownAMD
closed
9 months ago
8
Temporarily disable failing tests until bug fix is in mainline build
#1860
AlexBrownAMD
closed
9 months ago
0
VectorWidthB support, VectorWidth + non SourceSwap support, small bug fix for ClusterLocalRead
#1859
nakajee
closed
9 months ago
3
Fix mismatch issue with InitAccOpt + InnerUnroll
#1858
nakajee
closed
9 months ago
1
updating lib logic convertor script
#1857
babakpst
closed
9 months ago
0
updating Codeowners file
#1856
babakpst
closed
10 months ago
0
Revert "Optimization for ShadowLimit (#1829)"
#1855
nakajee
closed
10 months ago
2
script to summarize rocblas log
#1854
babakpst
closed
6 months ago
1
Limit build threads based on CPUs/RAM available on system
#1853
AlexBrownAMD
closed
7 months ago
2
[Feature]: about 7900xtx benchmark.
#1935
Axl-zhang
closed
3 months ago
7
adding code owners file
#1852
babakpst
closed
10 months ago
0
Fix HostLibraryTests on gfx942
#1851
AlexBrownAMD
closed
10 months ago
2
Predicate for arithmetic intensity
#1850
AlexBrownAMD
closed
9 months ago
3
Test limiting CI threads for only gfx11
#1849
AlexBrownAMD
closed
10 months ago
4
Adding option of rotating buffers for timing with cache eviction
#1848
mahmoodw
closed
9 months ago
8
No reject for GlobalSplitU=1 + MultipleBuffer
#1847
nakajee
closed
10 months ago
0
Adding issue template
#1846
abhimeda
closed
9 months ago
1
More cleanup on unused old client code
#1845
AlexBrownAMD
closed
10 months ago
2
Remove WGM related kern args if they are not needed
#1844
AlexBrownAMD
closed
10 months ago
6
Kernarg preloading
#1843
sdquiring
closed
8 months ago
8
update efficiency script for new arch
#1842
babakpst
closed
10 months ago
0
Add defineLocalSgpr
#1841
awhittle3
closed
7 months ago
7
Fix LLVM crash issue
#1840
AlexBrownAMD
closed
10 months ago
0
Fix an error with DisableKernelPieces + 32bit ShadowLimit
#1839
nakajee
closed
10 months ago
0
Re-enable negative values for WorkGroupMapping (asm kernel only)
#1838
nakajee
closed
10 months ago
2
adding xf32 datatype to rocblas-bench input creator
#1837
babakpst
closed
10 months ago
0
Previous
Next