-
I launched the confidential VM, however the QEMU complained with
```
[ 349.818805] NVRM: failed to initialize module.
[ 349.902869] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:2330)
[ 349.9028…
-
@Richardk80
I'm trying to get the updated requirements for latest version and get this error....
`pip install -r requirements.txt
Collecting certifi==2022.12.7
Using cached certifi-2022.12.…
-
Hi, we're running the demo script for 768x768 input image and it takes 22seconds to generate a 2 second clip, however we're running on an H100 SXM GPU. I was wondering if this generation time is norma…
-
**Your question**
Ask a clear and concise question about Flux.
```
$./scripts/launch.sh test/test_gemm_rs.py 4096 12288 49152 --dtype=bfloat16 --iters=10
torchrun --node_rank=0 --nproc_per_node=…
-
We are testing with SEV-SNP+H100. The cc mode with a single GPU works fine by following the deployment guide. Now we want to test non-cc mode with a regular VM. First we `--set-cc-mode=off`.
```
…
-
### 请提出你的问题 Please ask your question
报错如下
[2024-07-12 08:34:51,881] [ WARNING] install_check.py:289 - PaddlePaddle meets some problem with 8 GPUs. This may be caused by:
1. There is not enough GPU…
-
Hi,
I tried the `--cluster-key` option with trtllm-build.
I did the conversion with A100-80gb-sxm, then tried to deploy it on L4 after converting using the L4 option and it failed when starting up t…
-
Another issue, it seems we have A100 and A100-SXM separated for RunPod while combined for lambda labs. We probably need to separate them for all the clouds.
_Originally posted by @Michaelvll in htt…
-
### 🐛 Describe the bug
even tho on `Tensor.copy_` we see major improvements on BW on MI300X compared to H100. On a similar memory BW bound op like `sum()`, we were able to achieve a read bandwidth …
-
### 请提出你的问题
报错如下
Error Message Summary:
----------------------
ResourceExhaustedError:
Out of memory error on GPU 0. Cannot allocate 428.000000MB memory on GPU 0, 79.153320GB memory has been a…