issues
search
NVIDIA
/
Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271
stars
53
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Minor format changes
#3459
wujingyue
closed
6 hours ago
1
eraseInputDistinctRootDomains supports general logical-to-allocation transforms
#3458
wujingyue
opened
11 hours ago
3
[WIP][DO NOT REVIEW] Enable slice in vectorization analysis
#3457
jjsjann123
opened
12 hours ago
1
Support 2D inner reduction scheduler with autotuning
#3456
rdspring1
opened
1 day ago
1
Invalid indexing path when resize is used with a residual path
#3455
naoyam
opened
1 day ago
8
[WIP] (Yet another) indexing war for resize
#3454
naoyam
opened
1 day ago
4
Restructure baseline benchmarks
#3453
Priya2698
opened
1 day ago
2
Return a bool indicating if all nodes are visited
#3452
naoyam
closed
1 day ago
2
Skip sequence parallel test when D=1
#3451
cowanmeg
closed
2 days ago
1
Cleanup `currentLoadStage`
#3450
zasdfgbnm
closed
1 day ago
1
Add `ParallelDimensionMap::getNumThreadsEachBlock`
#3449
zasdfgbnm
closed
2 days ago
1
Split `insertMBarrierWaitBeforeFirstRead`
#3448
zasdfgbnm
closed
1 day ago
2
Stride `MatmulOp` according to set allocation domain
#3447
Priya2698
opened
2 days ago
0
Use mbarrier for WAR for circular buffering
#3446
zasdfgbnm
opened
2 days ago
4
Fix stride-order based allocation domain computation when output has reduction axis
#3445
Priya2698
opened
3 days ago
2
Unshard tensor sizes before binding.
#3444
wujingyue
opened
3 days ago
0
Fix gdimz in LaunchParams
#3443
rdspring1
closed
3 days ago
1
kill `onlyOneSerialForLoopOnStack`
#3442
zasdfgbnm
closed
2 days ago
3
Remove `getStageDepthFor`, `getPrefetchDistanceFor`, `circularBufferDepth`, `circularBufferPrefetchDistance`
#3441
zasdfgbnm
closed
3 days ago
1
[WIP] Enable translation of Hopper matmuls
#3440
jacobhinkle
opened
3 days ago
0
Patch vectorization on permuted inputs for PadOp
#3439
jjsjann123
closed
1 day ago
10
fix race in async copy
#3438
liqiangxl
opened
3 days ago
12
Variable renaming in AllocationInserter
#3437
zasdfgbnm
closed
3 days ago
2
Avoid canScheduleCompileTime for dynamic shape check
#3436
jacobhinkle
closed
1 day ago
4
Add `ir_utils::createRangeLoop`
#3435
zasdfgbnm
closed
3 days ago
1
Minor cleanup in `CloneTmaCircularBufferLoopAndInsertSync`
#3434
zasdfgbnm
closed
3 days ago
2
Add interface to directly work with CircularBufferOptions
#3433
zasdfgbnm
closed
3 days ago
1
Fix the setting of `gdimx` in 2d and 3d inner reduction heuristics
#3432
rdspring1
closed
2 days ago
1
Add ReductionParams to python frontend
#3431
rdspring1
closed
2 days ago
3
Allocate index for all circular buffer stages
#3430
zasdfgbnm
closed
3 days ago
3
Fix comment double buffer -> circular buffer
#3429
zasdfgbnm
closed
4 days ago
2
Race reported between Write access and Read access in fusion using async copy
#3428
liqiangxl
opened
4 days ago
2
thread predicate is missing in async copy
#3427
liqiangxl
opened
4 days ago
0
Minor cleanups and improvements
#3426
wujingyue
closed
3 days ago
1
[WIP] Resize scheduler
#3425
naoyam
opened
6 days ago
0
Remove unnecessary propagation that's already done by the ops API.
#3424
wujingyue
closed
2 days ago
4
use bdimy = 1 to WAR smem race
#3423
liqiangxl
closed
1 day ago
8
Remove an unused variable
#3422
wujingyue
closed
4 days ago
1
Extend isResharding to allow DID loop split.
#3421
wujingyue
opened
6 days ago
9
Print spaces for readability.
#3420
wujingyue
closed
4 days ago
2
Dynamic shape host latency is slow
#3419
jacobhinkle
closed
1 day ago
2
Add `unsafeBind` to `LaunchParams` to allow modification.
#3418
rdspring1
closed
6 days ago
2
Remove unnecessary cast
#3417
cowanmeg
closed
6 days ago
1
Allow inlining past loop broadcasts
#3416
jacobhinkle
opened
6 days ago
7
set maxrregcount in outer reduction heuristic
#3415
liqiangxl
closed
3 days ago
3
Only check actually used IDs in predicate elimination
#3414
jacobhinkle
opened
6 days ago
21
avoid calc reg count from block size in getMaxRegCount
#3413
liqiangxl
closed
6 days ago
2
Fix a typo
#3412
wujingyue
closed
6 days ago
1
Add support for stmatrix in the unit test HopperMatmulTest/HSH_NT_128BSwizzle
#3411
protonu
opened
1 week ago
1
Preserve additional IDs when mutating TensorDomains in OptOutMutator
#3410
jacobhinkle
closed
6 days ago
2
Next