issues
search
NVIDIA
/
Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271
stars
53
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add TT, TN, NT, NN tests for HopperMultipleMatmulScheduler
#3310
rdspring1
closed
2 weeks ago
2
Use SimplifyingIrBuilder for split and merge extents
#3309
naoyam
closed
1 week ago
10
init
#3308
zasdfgbnm
opened
3 weeks ago
0
Host benchmarking for a fusion with multiple segments
#3307
Priya2698
closed
3 weeks ago
3
Remove deprecated `clear_cuda_cache`
#3306
Priya2698
closed
3 weeks ago
1
Factorize ExpressionEvaluator::bind_.
#3305
wujingyue
closed
3 weeks ago
1
Disable nvfusertest_serde_check if DEBUG_SERDE=disable
#3304
jacobhinkle
closed
3 weeks ago
6
Change where we print python repros to allow us to print repros prior to segfaults
#3303
kevinstephano
closed
3 weeks ago
2
[DRAFT] Train generalized machine learning model using pointwise data set.
#3302
rdspring1
opened
3 weeks ago
0
Use deep evaluation of extents in remove_empty pass
#3301
jacobhinkle
closed
3 weeks ago
3
Compile forward function in baseline backward benchmarks
#3300
Priya2698
closed
2 weeks ago
2
Error found in Mistral-Nemo and Qwen2's Rope implementations
#3299
kevinstephano
closed
1 week ago
4
Maybe no electsync
#3298
zasdfgbnm
opened
3 weeks ago
0
Move `insertWarThreadSynchronization` after circular buffering pass
#3297
zasdfgbnm
opened
3 weeks ago
1
Only support mul-sum distributed matmul test for ampere and hopper
#3296
cowanmeg
closed
3 weeks ago
2
Fix elect sync predicate
#3295
zasdfgbnm
closed
3 weeks ago
2
Only the TMA thread arrive
#3294
zasdfgbnm
closed
3 weeks ago
1
Tracks performance issues related to inner reduction scheduler
#3293
liqiangxl
opened
3 weeks ago
0
INTERNAL ASSERT FAILED (Vectorized accesses cannot be inline with computation)
#3292
t-vi
closed
3 weeks ago
1
Segfault in NVFuser in Thunder container / CI
#3291
t-vi
closed
3 weeks ago
1
Memory inefficient order of execution with multiple parallel paths
#3290
IvanYashchuk
opened
3 weeks ago
4
Add information for coordinating segments in python frontend.
#3289
rdspring1
closed
3 weeks ago
5
fix padded bdimx to use warp reduction in inner reduction scheduler
#3288
liqiangxl
closed
3 weeks ago
1
reorder outer reduction tv in inner-outer scheduler when there are view ops in the fusion
#3287
liqiangxl
closed
3 weeks ago
4
[TEST] HostIR in FusionExecutor
#3286
csarofeen
closed
3 weeks ago
1
Fix test_circular_buffering.cpp
#3285
zasdfgbnm
closed
3 weeks ago
1
Fix communication lowering to support DID loop parallelization.
#3284
wujingyue
opened
3 weeks ago
0
Non-deterministic codegen with some of the Python tests
#3283
naoyam
closed
3 weeks ago
2
Bind sharded input/output tensors with DID-parallelized allocation domains.
#3282
wujingyue
opened
3 weeks ago
14
Change shape of `HSH_NT_128BSwizzle`
#3281
zasdfgbnm
closed
3 weeks ago
1
Export DEBUG_SERDE=true in compare_codegen.sh
#3280
jacobhinkle
opened
3 weeks ago
1
Tracking perf optimization of `HopperMatmulTest.HSH_NT_128BSwizzle` for problem size `(M=2048, N=2048, K=8192)`, CTA tile size `(128, 256)`
#3279
zasdfgbnm
opened
3 weeks ago
4
Schedule Hopper mma instruction
#3278
jacobhinkle
closed
1 week ago
13
Refactor MultiMatmulSchedulers
#3277
rdspring1
closed
3 weeks ago
2
Reduce number of circular buffering tests
#3276
zasdfgbnm
closed
3 weeks ago
1
add knobs control inner dim unroll and outer dim unroll in pointwise scheduler
#3275
liqiangxl
closed
2 weeks ago
19
avoid treating pointer to bool as bool when handling kir::asm in codegen
#3274
liqiangxl
closed
3 weeks ago
4
PTX code for async copy of bool type is not correctly generated
#3273
liqiangxl
closed
3 weeks ago
1
Tracks performance issues related to inner outer persistent scheduler
#3272
liqiangxl
opened
4 weeks ago
1
check vectorization factor of shared memory consumers to avoid illegal vectorization size
#3271
liqiangxl
closed
3 weeks ago
7
Add `enable_options` and `disable_options` to `fd.execute`
#3270
Priya2698
closed
1 week ago
12
Add dtype parameter to Tensor state in python frontend.
#3269
rdspring1
opened
4 weeks ago
4
Remove benchmarks/python/test_transformer.py
#3268
wujingyue
closed
3 weeks ago
1
Extra FillFunctor kernels
#3267
cowanmeg
closed
1 week ago
1
Refactor MultiMatrixScheduler
#3266
rdspring1
closed
3 weeks ago
0
Python test "at-exit" serialization has effect on repeat testing runs
#3265
jacobhinkle
closed
3 weeks ago
2
expose and test getPersistentBufferStorageParams
#3264
liqiangxl
opened
4 weeks ago
3
Create dispatch system for executors
#3263
csarofeen
closed
1 week ago
41
patch potential segfault
#3262
jjsjann123
closed
3 weeks ago
10
Lowering vectorized pad
#3261
jjsjann123
closed
2 weeks ago
6
Previous
Next