hpu Search Results - Githubissues

1000+ results
for hpu

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Lightning-AI/pytorch-lightning #19817

Multi-node Training with DDP stuck at "Initialize distribute…

### Bug description I'm working on a slurm cluster with 8 AMD MI100 GPUs distributed in 2 nodes, with 4 GPUs in each node. I follow the instructions (https://lightning.ai/docs/pytorch/stable/clouds…

OswaldHe updated 4 months ago
4
cvignac/DiGress #56

Repeat that there is a problem with your work

Hello, I have this problem: The size of tensor a (128) must match the size of tensor b (0) at non-singleton dimension 1, how do I solve it

clclclaiggg updated 1 year ago
2
huggingface/optimum-habana #31

Error in tests when test_trainer is run before test_trainer_…

Unit and integration tests currently needs to be run with `pytest tests/test_gaudi_configuration.py tests/test_trainer_distributed.py tests/test_trainer.py`. If not, for instance with `pytests tests/`…

regisss updated 1 year ago
1
pytorch/pytorch #128202

Wrong result for Inplace tensor update on transpose for some…

### 🐛 Describe the bug I am using HPU device for testing, the following code shows the incorrect result on PyTorch 2.3.0. While it was correct for PyTorch 2.2.2 ``` import torch import habana_fram…

jerrychenhf updated 1 month ago
8
rhasspy/piper #303

"RuntimeError: No CUDA GPUs are available" while trying to s…

I do not know how to fix this. please help python3 -m piper_train \ --dataset-dir ~/piper/my-training \ --accelerator 'gpu' \ --devices 1 \ --batch-size 32 \ --validati…

FemBoxbrawl updated 9 months ago
6
Lightning-AI/pytorch-lightning #20215

Model does not update its weights

### Bug description Hi, I am using PyTorch lightning to implement some new optimization strategies using `automatic_optimization=False`. For certain setting my optimization strategy (using `automa…

kopalja updated 2 weeks ago
4
HabanaAI/vllm-fork #275

[Bug]: `block_softmax` accuracy issue in flat_pa kernel, qwe…

### Your current environment ```text The output of `python collect_env.py` ``` ### 🐛 Describe the bug This issue is introduced by `block_softmax` kernel(part of `flat_pa`, see #169 ) For some …

jikunshang updated 1 week ago
15
Stability-AI/stable-audio-tools #114

Continuing on an A100 does not work in Colab

Hi, if I understood correctly, to continue with the 16GB checkpoints the --ckpt-path is the right way to pass the weights. I tried resuming directly after training the base model for some hours, I onl…

Taikakim updated 2 months ago
3
opea-project/GenAIComps #339

Support launch as Non-Root user in all published container i…

This is a more generic requirement to all the container images created here. Many Kubernetes clouds have [security standard policy](https://kubernetes.io/docs/concepts/security/pod-security-standards)…

lianhao updated 1 month ago
3
huggingface/tgi-gaudi #218

llama3.1-70B-instruct 422 error Template error: unknown test…

### System Info TGI-gaudi 2.0.4 docker image. Model = meta-llama/Meta-Llama-3.1-70B-Instruct HW = Gaudi2, 4 cards python 3.11 langchain 0.2.12 langchain-core 0.2.28…

minmin-intel updated 3 weeks ago
1

上一页 1...17 18 19 20 21 22 23...100 下一页

1000+ results for hpu

1000+ results
for hpu