smem Search Results - Githubissues

1000+ results
for smem

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

state-spaces/mamba #601

Query regarding indexing of smem_delta_a

From what I understand, `smem_delta_a` is used to initialize the value of `delta_a` in shared memory in these lines: https://github.com/state-spaces/mamba/blob/bc84fb1172e6dea04a7dc402118ed19985349e9…

SudhanshuBokade updated 1 week ago
3
Dao-AILab/flash-attention #1318

Why is barrier_O Necessary, and What is it Waiting For?

Why do we need this barrier_O? What exactly is it waiting for? For each read of V/K, we have pipeline and producer_commit to control that flow. Also, the placement of barrier_O seems strange—it appear…

ziyuhuang123 updated 1 day ago
6
samuelcolvin/jinjahtml-vscode #158

C++ Formatting + Macro Parsing Error

I've got a file that I'm editing as with `jinja-cpp` formatter. I import a macro from another file with: ``` {%- from 'macros.jinja' import declare_smem_arrays with context %} ``` The mac…

asglover updated 2 weeks ago
1
Dao-AILab/flash-attention #1266

Where in the code demonstrate inter-warp policy?

``` template CUTLASS_DEVICE void mma(Params const& mainloop_params, MainloopPipeline pipeline_k, MainloopPipeline pipeline_v, PipelineState& smem_pipe_read_k…

ziyuhuang123 updated 3 weeks ago
4
riscv-collab/riscv-openocd #1158

Does OpenOCD have any plans to support RISC-V trace?

Currently, GDB + OpenOCD only supports debugging of RISC-V SoC. It should not be able to directly access trace components such as the Trace Encoder and SMEM. Furthermore, it cannot parse trace data. I…

zhangdujiao updated 14 hours ago
11
NVIDIA/cutlass #1882

[QST] Why is there bank conflict in this simple layout?

I modified the `tiled_copy.cu` example in cute/tutorial to use the following layout ``` auto tensor_shape = cute::Shape{}; auto block_shape = cute::Shape{}; ... Tensor tensor_S = make_tensor(m…

seanxwzhang updated 2 weeks ago
3
spcl/dace #1284

Broadcasts to Shared Memory on GPU Runs in Serial

**Describe the bug** When running code on a GPU, if you have a block of shared memory and you broadcast a variable to it, the generated CUDA assigns in serial. In my reproducer, the performance and b…

computablee updated 1 week ago
5
JaehyeokHan/Smartphone-Backup-Data-Extractor #3

.smem files?

hi I saw that samsung phone messages from 2017 and more have the .smem format (surely a derivative of the previous ones), is it possible that this format can be integrated as well? best regards

7zxkv updated 1 year ago
2
NVIDIA/cutlass #1817

[QST]Tensor Shape Mismatch in CUTLASS: Does Layout Informati…

**What is your question?** I encountered a strange bug. Firstly, my SMEM is divided into two regions. One part is for the mainloop (reading A and B), and the other part is for the epilogue (writing…

ziyuhuang123 updated 3 weeks ago
1
NVIDIA/Fuser #2979

Self-mapping error when compiling 1D bias linear fusion with…

The following test fails currently: ```c++ TEST_F(MatmulSchedulerTest, SelfMappingErrorSmemEpilogue1dBias) { NVFUSER_TEST_CUDA_ARCH_RANGE_GUARD(7, 5, 9, 0); Fusion fusion_obj; Fusion* fusion = …

jacobhinkle updated 1 week ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for smem

1000+ results
for smem