-
I'm trying to follow the instructions on apple m1 macos and am encountering the following compile error
```c++
bazel build -c opt :hirm
Extracting Bazel installation...
Starting local Bazel serv…
-
I noticed hopper cluster setting may have a chance to optimize the performance of batch_decode by merging `VariableLengthMergeStates` with `BatchDecodeWithPagedKVCacheKernel`. Is there any plan to us…
-
## Environment
- OS: [Ubuntu 23.06.30]
- Hardware (GPU, or instance type): [8xV100]
## The issue
I am trying Streaming Dataset with [Pytorch Lightning](https://lightning.ai/docs/pytorch/…
-
Sorry if this is a configuration mistake, but I am using 42 with Nos3 and I am experiencing some weird behaviour that I could not find in other shared images. It seems as if the stars are always overl…
-
**What is your question?**
I'm interested in extending `Example 50: Hopper Gemm with Epilogue Swizzle` to `Ampere` architectures and am trying to understand how swizzled `SmemLayout` avoids both bank…
-
Feature request for workgroup/shared memory pointers.
Ideally we could get pointers to workgroup/shared memory like in cuda/opencl.
```
void foo(float * ptr){
...
}
...
shared float …
-
### The bug
library job failed after adding external library
### The OS that Immich Server is running on
Ubuntu 24.04 LTS
### Version of Immich Server
v1.108.0
### Version of Immich Mobile App
…
-
We use Centos6.10 to make AppImage bundles (32 and 64 bits).
Until now, we use a bash script with all rules to clean-up appdir before to package with AppImageKit. This stage work fine.
Now i wan…
-
**Describe the bug**
Running the add id module of curator runs into ooms even with small batch size, e.g., 32.
The dataset for adding ID is a single snapshot of Red Pajama v2 dataset, which is ab…
yyu22 updated
3 months ago
-
NVIDIA is implementing an optimization to pass the LHS operand of WGMMA ops in register. This allows element-wise prologues to pass the intermediate result directly to WGMMA without writing it to shar…