issues
search
NVIDIA
/
Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271
stars
53
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Additional IDs are lost when mutating a TensorDomain
#3409
jacobhinkle
closed
6 days ago
0
Clean-up: reduction producer IterDomains aren't mapped anyway.
#3408
wujingyue
closed
6 days ago
1
Support 2D pointwise scheduler with autotuning
#3407
rdspring1
opened
1 week ago
1
[WIP] Schedule Hopper MMA without input broadcasts
#3406
jacobhinkle
opened
1 week ago
0
Make sorting deterministic.
#3405
wujingyue
closed
1 week ago
3
Parametrize Hopper matmul tests
#3404
jacobhinkle
opened
1 week ago
1
Reject fusions with CPU scalar outputs in `KernelExecutor`
#3403
Priya2698
closed
2 days ago
12
Skip SetUp when the communicator isn't available.
#3402
wujingyue
closed
1 week ago
1
Prevent marking inplace update target as candidate for alias analysis
#3401
jjsjann123
closed
6 days ago
5
warp-specialization
#3400
zasdfgbnm
opened
1 week ago
0
AllocationDomainPass sets correct allocation orders for SdpaFwdOp/SdpaBwdOp.
#3399
wujingyue
closed
1 week ago
3
Change tests to call runPass directly.
#3398
wujingyue
closed
1 week ago
1
Replace PERMISSIVE graph with BROADCAST for matmul scheduler
#3397
jacobhinkle
closed
1 week ago
1
Non-deterministic output tensor names
#3396
naoyam
closed
1 week ago
2
Add support to store the outputs of the Mma operator using stmatrix
#3395
protonu
opened
1 week ago
1
Add thunder benchmarks
#3394
Priya2698
opened
1 week ago
4
add knob controls unroll on top of vectorization in inner reduction
#3393
liqiangxl
closed
1 week ago
3
Ring-based decomposition for Allgather+GEMM overlap ATen implementation
#3392
nsarka
opened
1 week ago
0
Accept axis mapping when defining MmaOp
#3391
jacobhinkle
closed
1 week ago
7
Test multiple memory formats
#3390
wujingyue
closed
1 week ago
3
Failed to create/execute the same FusionDefinition multiple times.
#3389
wujingyue
opened
1 week ago
0
Update input sizes for `test_many_segment_benchmark` to ensure kernel reuse
#3388
Priya2698
closed
1 week ago
1
Enable IdModel tensor indexer when expanded IDs are reshaped
#3387
naoyam
closed
1 week ago
7
SdpaFwdOp::evaluate produces a tensor whose stride order doesn't match the output allocation domain.
#3386
wujingyue
closed
1 week ago
3
Clean up IdModel options
#3385
naoyam
closed
1 week ago
3
Remove debug prints
#3384
wujingyue
closed
1 week ago
1
Use TMA with reduction for Hopper Split-K
#3383
jacobhinkle
closed
1 week ago
2
Use structured binding for better readability
#3382
wujingyue
closed
1 week ago
1
Remove MatMulTileOptions::instruction_tile
#3381
jacobhinkle
opened
1 week ago
1
Rename addSetsForCacheReads to cacheOperandsToRegisters
#3380
jacobhinkle
closed
1 week ago
1
in extent substitution preseg pass, use DisjointSets of extents which is constructed from DisjointSets of IterDomains
#3379
liqiangxl
closed
1 week ago
5
return in()->toInlineString() in LoadStoreOp::toInlineString
#3378
liqiangxl
closed
1 week ago
2
Change a functor to a regular function.
#3377
wujingyue
closed
1 week ago
1
Inspect all IDs instead of just loop in ParallelDimensionMap
#3376
jacobhinkle
closed
1 week ago
2
Validate allocation sizes and strides more widely.
#3375
wujingyue
closed
1 week ago
4
Indexing error with HF's Qwen 2 model
#3374
naoyam
closed
1 week ago
2
Fix the legacy loop indexing traversal
#3373
naoyam
closed
1 week ago
2
Enable MmaOp to receive unbroadcasted inputs
#3372
jacobhinkle
closed
1 week ago
0
Explicit check build flag
#3371
jjsjann123
closed
2 weeks ago
2
Redo #3326.
#3370
wujingyue
closed
2 weeks ago
2
Two-hop mutations are not supported. Found registrations from 5 to 5 to 5
#3369
wujingyue
closed
1 week ago
6
Add NVFUSER_DUMP=python_definition_segments
#3368
jacobhinkle
closed
2 weeks ago
3
[Do not merge] Overlap AG+GEMM benchmark
#3367
samnordmann
opened
2 weeks ago
0
[Proposal] Support root->logical transforms in Fusion inputs
#3366
jacobhinkle
closed
2 weeks ago
5
Updating readme
#3365
jjsjann123
closed
1 hour ago
4
Fix launch configuration error with ScatterGatherTest.TorchGatherAllRankAllSelectedDim
#3364
naoyam
closed
2 weeks ago
1
Accidentally lost in #3028
#3363
naoyam
closed
1 week ago
1
Renaming from #3263 part2
#3362
naoyam
closed
2 weeks ago
2
Use SimplyfingIrBuilder for circular buffering
#3361
naoyam
closed
2 weeks ago
1
LoadStoreOp::toInlineString should not assume it's a tensor op
#3360
naoyam
closed
1 week ago
7
Previous
Next