issues
search
NVIDIA
/
Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271
stars
53
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[WIP][DO NOT REVIEW] Enable slice in vectorization analysis
#3457
jjsjann123
opened
36 minutes ago
1
Support 2D inner reduction scheduler with autotuning
#3456
rdspring1
opened
23 hours ago
1
Invalid indexing path when resize is used with a residual path
#3455
naoyam
opened
23 hours ago
5
[WIP] (Yet another) indexing war for resize
#3454
naoyam
opened
1 day ago
3
Restructure baseline benchmarks
#3453
Priya2698
opened
1 day ago
1
Return a bool indicating if all nodes are visited
#3452
naoyam
closed
1 day ago
2
Skip sequence parallel test when D=1
#3451
cowanmeg
closed
1 day ago
1
Cleanup `currentLoadStage`
#3450
zasdfgbnm
closed
1 day ago
1
Add `ParallelDimensionMap::getNumThreadsEachBlock`
#3449
zasdfgbnm
closed
1 day ago
1
Split `insertMBarrierWaitBeforeFirstRead`
#3448
zasdfgbnm
closed
1 day ago
2
Stride `MatmulOp` according to set allocation domain
#3447
Priya2698
opened
2 days ago
0
Use mbarrier for WAR for circular buffering
#3446
zasdfgbnm
opened
2 days ago
3
Fix stride-order based allocation domain computation when output has reduction axis
#3445
Priya2698
opened
2 days ago
2
Unshard tensor sizes before binding.
#3444
wujingyue
opened
2 days ago
0
Fix gdimz in LaunchParams
#3443
rdspring1
closed
2 days ago
1
kill `onlyOneSerialForLoopOnStack`
#3442
zasdfgbnm
closed
2 days ago
3
Remove `getStageDepthFor`, `getPrefetchDistanceFor`, `circularBufferDepth`, `circularBufferPrefetchDistance`
#3441
zasdfgbnm
closed
2 days ago
1
[WIP] Enable translation of Hopper matmuls
#3440
jacobhinkle
opened
2 days ago
0
Patch vectorization on permuted inputs for PadOp
#3439
jjsjann123
closed
1 day ago
10
fix race in async copy
#3438
liqiangxl
opened
2 days ago
10
Variable renaming in AllocationInserter
#3437
zasdfgbnm
closed
2 days ago
2
Avoid canScheduleCompileTime for dynamic shape check
#3436
jacobhinkle
closed
1 day ago
4
Add `ir_utils::createRangeLoop`
#3435
zasdfgbnm
closed
3 days ago
1
Minor cleanup in `CloneTmaCircularBufferLoopAndInsertSync`
#3434
zasdfgbnm
closed
2 days ago
2
Add interface to directly work with CircularBufferOptions
#3433
zasdfgbnm
closed
3 days ago
1
Fix the setting of `gdimx` in 2d and 3d inner reduction heuristics
#3432
rdspring1
closed
1 day ago
1
Add ReductionParams to python frontend
#3431
rdspring1
closed
2 days ago
3
Allocate index for all circular buffer stages
#3430
zasdfgbnm
closed
2 days ago
3
Fix comment double buffer -> circular buffer
#3429
zasdfgbnm
closed
3 days ago
2
Race reported between Write access and Read access in fusion using async copy
#3428
liqiangxl
opened
3 days ago
2
thread predicate is missing in async copy
#3427
liqiangxl
opened
4 days ago
0
Minor cleanups and improvements
#3426
wujingyue
closed
3 days ago
1
[WIP] Resize scheduler
#3425
naoyam
opened
5 days ago
0
Remove unnecessary propagation that's already done by the ops API.
#3424
wujingyue
closed
1 day ago
4
use bdimy = 1 to WAR smem race
#3423
liqiangxl
closed
1 day ago
8
Remove an unused variable
#3422
wujingyue
closed
3 days ago
1
Extend isResharding to allow DID loop split.
#3421
wujingyue
opened
5 days ago
8
Print spaces for readability.
#3420
wujingyue
closed
3 days ago
2
Dynamic shape host latency is slow
#3419
jacobhinkle
closed
1 day ago
2
Add `unsafeBind` to `LaunchParams` to allow modification.
#3418
rdspring1
closed
5 days ago
2
Remove unnecessary cast
#3417
cowanmeg
closed
5 days ago
1
Allow inlining past loop broadcasts
#3416
jacobhinkle
opened
6 days ago
7
set maxrregcount in outer reduction heuristic
#3415
liqiangxl
closed
3 days ago
3
Only check actually used IDs in predicate elimination
#3414
jacobhinkle
opened
6 days ago
18
avoid calc reg count from block size in getMaxRegCount
#3413
liqiangxl
closed
6 days ago
2
Fix a typo
#3412
wujingyue
closed
6 days ago
1
Add support for stmatrix in the unit test HopperMatmulTest/HSH_NT_128BSwizzle
#3411
protonu
opened
6 days ago
1
Preserve additional IDs when mutating TensorDomains in OptOutMutator
#3410
jacobhinkle
closed
5 days ago
2
Additional IDs are lost when mutating a TensorDomain
#3409
jacobhinkle
closed
5 days ago
0
Clean-up: reduction producer IterDomains aren't mapped anyway.
#3408
wujingyue
closed
6 days ago
1
Next